wolfpld / tracy

Frame profiler
https://tracy.nereid.pl/
Other
10.21k stars 684 forks source link

windows precompiled version not working #894

Closed vipcxj closed 1 month ago

vipcxj commented 1 month ago

I downloaded windows-0.11.1.zip which has several executable files. I double-clicked to run tracy-profiler.exe and nothing happened, I ran several other programs and still nothing happened. I used the command line to run tracy-profiler.exe and it returned straight back with nothing outputted My windows version is win10 22H2 (19045.4894)

wolfpld commented 1 month ago

887

vipcxj commented 1 month ago

@wolfpld You are right, After install Visual C++ Redistributable, it works. By the way, is tracy support c++20 coroutine? I use coroutine with asio. i can't find any examples about asio with coroutine, only one for fiber. I don't use fiber, I don't even see the co_await and co_return keywords in this example. I tried using ZoneScoped directly, but it prompts ZoneEnd to execute 2 times. The documentation mentions that ZoneScoped is based on the RAII mechanism, which as I recall works in coroutines as well, I don't know what happened.

wolfpld commented 1 month ago

Tracy "fibers" provide a general mechanism for async tasks. If you can push instrumentation around the C++20 facilities, things should be working.

vipcxj commented 1 month ago

here is my code to start the asio io_context tasks

            auto m_io_pool = std::make_shared<BS::thread_pool>();
            for (size_t i = 0; i < m_io_pool->get_thread_count(); i++)
            {
                m_io_pool->detach_task([self = shared_from_this()]() {
                    auto guard = asio::make_work_guard(self->m_io_ctx.get_executor());
                    try
                    {
                        self->m_io_ctx.run();
                    }
                    catch(...)
                    {
                        self->m_logger->error(cfgo::what());
                    }
                    CFGO_SELF_DEBUG("io ctx completed in thread {}", std::this_thread::get_id());
                });
            }

How to make tracy work with it? Currently I'm hoping to use tracy to measure the execution times of my various functions to find out where exactly I'm stuck

vipcxj commented 1 month ago

Here is my code:

        auto Device::_ready_loop(const cfgo::close_chan & closer) -> asio::awaitable<void>
        {
            ...
            do
            {
                {
                    ZoneNamed(wait_not_full, true);
                    do
                    {
                        auto ch = m_ready_maybe_not_full_notifier.make_notfiy_receiver();
                        {
                            std::lock_guard lk(m_ready_mutex);
                            if (!_ready_full())
                            {
                                break;
                            }
                        }
                        co_await cfgo::chan_read_or_throw<void>(ch, closer);
                    } while (true);
                }
                // only ready loop can make ready full, so since ready is not full here, it will keey not full until the end of the while.
                {
                    ZoneNamed(lock_blocks, true);
                    co_await prom::measure_time<void>(
                        cfgo::fix_async_lambda([self, closer]() -> asio::awaitable<void> {
                            return self->m_block_manager.lock(std::move(closer));
                        }),
                        [prom_enabled](prom::duration_t time) {
                            if (prom_enabled)
                            {
                                auto & metrics = sr::Manager::instance().metrics();
                                metrics.m_infer_task_block_time_hist.Observe(std::chrono::duration_cast<std::chrono::milliseconds>(time).count());
                            }
                        }
                    );
                }
                DEFER({
                    m_block_manager.unlock();
                });
                {
                    ZoneNamed(copy_ready_data, true);
                    m_block_manager.collect_locked_blocker(locked_blockers);
                    int batches = locked_blockers.size();
                    assert(batches <= conf->ai_target_batch());
                    m_logger->trace("{} sample locked, target batch: {}", batches, conf->ai_target_batch());
                    if (prom_enabled)
                    {
                        auto & metrics = sr::Manager::instance().metrics();
                        metrics.m_ai_infer_batches_hist.Observe(batches);
                    }
                    if (locked_blockers.empty())
                    {
                        continue;
                    }
                    assert(!_ready_full());
                    std::uint32_t write_slot = m_ready_tail_offset;
                    m_ready_metas[write_slot] = Batch {};
                    auto & batch =  m_ready_metas[write_slot];

                    TIMED_CUDA_DECLARE;
                    START_CUDA_CONTEXT(m_context);
                    TIMED_CUDA_START(m_ready_stream);

                    for (int i = 0; i < batches; ++i)
                    {
                        auto & blocker = locked_blockers[i];
                        auto meta = std::static_pointer_cast<sr::DataPrepareTask::Meta>(blocker.get_pointer_user_data());
                        batch->add_meta(meta);
                        sr::cuda::copy_ai_input(
                            conf->ai_frame_width(), conf->ai_frame_height(), batches,
                            m_prepare_areas, meta->slot(), meta->offset(),
                            m_ready_areas, i, write_slot,
                            m_ready_stream
                        );
                    }

                    TIMED_CUDA_END(m_ready_stream);
                    SYNC_CUDA_STREAM_ASYNC(m_ready_stream, Device, m_ready_sync_ch);
                    END_CUDA_CONTEXT;
                    CUresult cu_res = co_await chan_read_or_throw<CUresult>(m_ready_sync_ch, closer);
                    checkCuda(cu_res);
                    m_logger->trace("Copying batch frames to ready areas use {} ms", TIMED_CUDA_GET());

                    {
                        std::lock_guard lk(m_ready_mutex);
                        assert(write_slot == m_ready_tail_offset);
                        assert(!_ready_full());
                        m_ready_tail_offset = (m_ready_tail_offset + 1) % m_ready_metas.size();
                        if (m_ready_tail_offset == m_ready_head_offset)
                        {
                            m_ready_full = true;
                        }
                    }
                    m_ready_maybe_not_empty_notifier.notify();
                }
                FrameMarkNamed(m_device_name.c_str());
            } while (true);
        }

If there is only zone wait_not_full, it works, However, when i add lock_blocks and zone copy_ready_data, profiler show "zone is ended twice". I think every scope only has one zone, it should end only once.