Closed brakhane closed 2 months ago
An alternative would be to just use one mutex, of course
It doesn't seem like using just a single mutex has any negative performance impact, so I've changed the code to use just one.
This is not good, this part of code must absolutely not have any mutex locks. I never had a crash with this, as the whole map is being populated in LoadShaders (on the object_pso_job_ctx jobs, which are also guaranteed to be waited before being used). Can you send a call stack or repro step when this occurs?
Or otherwise this function could use unordered_map.find(), and return nullptr when element doesn't exist, we never intend to add element in the GetObjectPSO function.
This is not good, this part of code must absolutely not have any mutex locks. I never had a crash with this, as the whole map is being populated in LoadShaders (on the object_pso_job_ctx jobs, which are also guaranteed to be waited before being used). Can you send a call stack or repro step when this occurs?
Crash happens on my machine semi regularly, especially when run on debug
Exception
Exception thrown: read access violation.
**this** was 0xFFFFFFFFFFFFFFF7.
Stacktrace
> Editor_Windows.exe!std::_Ref_count_base::_Decref() Line 1148 C++
Editor_Windows.exe!std::_Ptr_base<void>::_Decref() Line 1368 C++
Editor_Windows.exe!std::shared_ptr<void>::~shared_ptr<void>() Line 1651 C++
Editor_Windows.exe!std::shared_ptr<void>::operator=<wi::graphics::dx12_internal::PipelineState_DX12,0>(const std::shared_ptr<wi::graphics::dx12_internal::PipelineState_DX12> & _Right) Line 1661 C++
Editor_Windows.exe!wi::graphics::GraphicsDevice_DX12::CreatePipelineState(const wi::graphics::PipelineStateDesc * desc, wi::graphics::PipelineState * pso, const wi::graphics::RenderPassInfo * renderpass_info) Line 4054 C++
Editor_Windows.exe!wi::renderer::LoadShaders::__l35::<lambda_308>::operator()(wi::jobsystem::JobArgs args) Line 2098 C++
[External Code]
Editor_Windows.exe!wi::jobsystem::Job::execute() Line 56 C++
Editor_Windows.exe!wi::jobsystem::PriorityResources::work(unsigned int startingQueue) Line 103 C++
Editor_Windows.exe!wi::jobsystem::Initialize::__l10::<lambda_1>::operator()() Line 222 C++
[External Code]
Thread diagram
From time to time, the crash happens somewhere else, but it's always because of some pointers pointing to unallocated memory.
On a side note: there is another potential issue I'm currently working on, and that is that std::shared_ptr is not threadsafe, and GraphicsDeviceChild uses one to store internal_state, and I believe there are instances where some threads will access the value while another could change the value of internal_state. The easiest way would be to use std::atomic<std::shared_ptr<void>>
, but that's C++20, so I'm currently working on a poor man's implementation of it that doesn't need to use as many locks. I've also found some implementations of it that are lock free, but they require that pointers are maximum 48 bits wide (which I assume is ok for Wicked), because they store the reference count in the remaining bits. That would be an alternative if we absolutely cannot have locks there.
However, the patch in this PR also makes all those crashes go away, so maybe we can get around it if we figure out a good way to do this
Or otherwise this function could use unordered_map.find(), and return nullptr when element doesn't exist, we never intend to add element in the GetObjectPSO function.
I've thought about that too, but is it absolutely guaranteed that there will be no two calls to GetObjectPSO in parallel that would add a value (either the same or even different ones)? We simply cannot add any values to unordered_map while any other thread might be reading it.
In other words, can we gather all ObjectRenderingVariants at the start before any GetObjectPSO calls are made? Or maybe, would it be possible to lock GetObjectPSO before all rendering variants are encountered and only then unlock it?
FWIW, I can make the crash disappear by using std::unordered_map, but I believe that's just because std::unordered_map will only mutate its internal data structures when it has to resize its internal arrays, which doesn't happen as often, so it's not really a fix IMO.
If we use find() then it can't add values to it like the [] operator, so it should be fine no?
As for std::shared_ptr I think it's thread safe, internally the refcounter should be implemented with atomic.
As my second comment, would it help if we just use the safer find() method of the map and return nullptr if element is not found?
iterators are invalidated if an insertion causes a rehash. So if we get nullptr and insert an element, there's a chance that other threads that currently hold data from GetObjectPSO now have garbage data.
But I can give it a try.
It would be best though if we could fill the map "beforehand" somehow and then never mutate it again, then we wouldn't need to use a mutex.
As for std::shared_ptr I think it's thread safe, internally the refcounter should be implemented with atomic.
I thought so too at first, but apparently that's not the case. Even MSDN mentions that:
Multiple threads can read and write different shared_ptr objects at the same time, even when the objects are copies that share ownership.
Emphasis mine. We can have 2 different shared_ptr objects pointing to the same object and that will work, but we cannot have multiple threads read the same shared_ptr object when another might be modifying.
cppreference also mentions this:
If multiple threads of execution access the same shared_ptr object without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur; the std::atomic
can be used to prevent the data race.
As well as multiple SO threads, like this one. And I admit I'm also confused by this. But I guess std::shared_ptr only guarantees that if multiple threads create instances of that that point to the same object, the reference count will be correct without needing to have synchronization.
Since you've found the underlying cause of the bug, this is not necessary anymore.
Thanks for the detailed reporting.
operator[] can insert an element, so it's not safe to call in multiple threads. We need to have a mutex when accessing it.