Closed nulld closed 3 years ago
So I guess timer is null. Which would mean that i>=sRunningTimers.length. Can't really see how. The other possibility is thread safety. If a different thread pushes to the time list while this loop is running, it might get messed up.
It is also possible that SetSize is not GC-moving safe - have not fully studied it. I should add a test for it. It is also not great to use for resizing here because it does not allow wiggle room in the size and may cause excessive reallocing. Since splice will optimize away the return result if you do not use it, I would recommend just removing the 'setSize' path and always use splice.
For simplicity, you could also not bother moving the last element and just splice(i,1)
.
The other possibility is thread safety. If a different thread pushes to the time list while this loop is running, it might get messed up.
Yes, it's very likely the cause of my problem, thanks. We sometimes use haxe.Timer.delay
to call something in haxe thread.
Since splice will optimize away the return result if you do not use it, I would recommend just removing the 'setSize' path and always use splice.
Thanks for advice, exaclly this aproach was introduced here: https://github.com/haxenme/nme/pull/658 But because of it closed by author and not able to be merged, made another one. Please review it.
It is also possible that SetSize is not GC-moving safe - have not fully studied it. I should add a test for it.
It will be great. Btw, we have rarely crashes in StringMap
can be moving gc be cause of it too?
Routine hx::TStringElement<Dynamic>::getHash() at /opt/haxelib/haxelib-dev/hxcpp/src/hx/Hash.h:81
Routine hx::Hash<hx::TStringElement<Dynamic> >::set(String, Dynamic const&) at /opt/haxelib/haxelib-dev/hxcpp/src/hx/Hash.h:565
Routine __string_hash_set(hx::Object*, Dynamic&, String, Dynamic const&, bool) at /opt/haxelib/haxelib-dev/hxcpp/src/hx/Hash.cpp:314
Routine haxe::ds::StringMap_obj::set(String, Dynamic) at /builds/zLuWkBZg/1/wnw/charm-mobile-client/bin/android/haxe/cpp/./src/haxe/ds/StringMap.cpp:73
We have this gc flags set for now:
<haxedef name="HXCPP_GC_GENERATIONAL"/>
<haxedef name="HXCPP_GC_MOVING"/>
There are tests for StringMap with Gc: https://github.com/HaxeFoundation/hxcpp/blob/master/test/haxe/TestStringHash.hx
Although, maybe not with Dynamic values - these could be added (possibly using Null<Int>
)
It can be tricky to reproduce - but if it can be reproduced then it might not be too hard to fix, and any ideas about what makes it more likely to crash would be good.
StringMap is also not thread safe, so something to think about.
but if it can be reproduced then it might not be too hard to fix, and any ideas about what makes it more likely to crash would be good.
Finally, got it in debugger.
Crashed thread stack:
CharmFarmGd.exe!hx::MarkAllocUnchecked(void * inPtr, hx::MarkContext * __inCtx) Line 2000
at d:\devel\haxelib-dev\hxcpp\src\hx\gc\immix.cpp(2000)
CharmFarmGd.exe!hx::MarkAlloc(void * inPtr, hx::MarkContext * __inCtx) Line 487
at d:\devel\haxelib-dev\hxcpp\include\hx\gc.h(487)
CharmFarmGd.exe!hx::MarkMember<String>(String & outT, hx::MarkContext * __inCtx) Line 49
at d:\devel\haxelib-dev\hxcpp\include\hx\gctypeinference.h(49)
CharmFarmGd.exe!hx::StringData::__Mark(hx::MarkContext * __inCtx) Line 2272
at d:\devel\haxelib-dev\hxcpp\src\string.cpp(2272)
CharmFarmGd.exe!hx::MarkContext::processMarkStack() Line 1877
at d:\devel\haxelib-dev\hxcpp\src\hx\gc\immix.cpp(1877)
CharmFarmGd.exe!GlobalAllocator::ThreadLoop(int inId) Line 4397
at d:\devel\haxelib-dev\hxcpp\src\hx\gc\immix.cpp(4397)
CharmFarmGd.exe!GlobalAllocator::SThreadLoop(void * inInfo) Line 4434
at d:\devel\haxelib-dev\hxcpp\src\hx\gc\immix.cpp(4434)
Some other threads are doing something gc related, too (doesn't them should wait some semaphore or smth atm?)
CharmFarmGd.exe!hx::MarkObjectAlloc(hx::Object * inPtr, hx::MarkContext * __inCtx) Line 496
at d:\devel\haxelib-dev\hxcpp\include\hx\gc.h(496)
CharmFarmGd.exe!hx::MarkMember<ru::stablex::ui::widgets::Widget_obj>(hx::ObjectPtr<ru::stablex::ui::widgets::Widget_obj> & outT, hx::MarkContext * __inCtx) Line 16
at d:\devel\haxelib-dev\hxcpp\include\hx\gctypeinference.h(16)
CharmFarmGd.exe!com::nevosoft::waw::view::gui::DailyQuest_obj::__Mark(hx::MarkContext * __inCtx) Line 5766
at d:\devel\charm-mobile-client\bin_test\windows\haxe\cpp\src\com\nevosoft\waw\view\gui\dailyquest.cpp(5766)
CharmFarmGd.exe!hx::MarkContext::processMarkStack() Line 1877
at d:\devel\haxelib-dev\hxcpp\src\hx\gc\immix.cpp(1877)
CharmFarmGd.exe!GlobalAllocator::ThreadLoop(int inId) Line 4397
at d:\devel\haxelib-dev\hxcpp\src\hx\gc\immix.cpp(4397)
CharmFarmGd.exe!GlobalAllocator::SThreadLoop(void * inInfo) Line 4434
at d:\devel\haxelib-dev\hxcpp\src\hx\gc\immix.cpp(4434)
Like this one:
CharmFarmGd.exe!HxSemaphore::Wait() Line 275
at d:\devel\haxelib-dev\hxcpp\include\hx\thread.h(275)
CharmFarmGd.exe!LocalAllocator::PauseForCollect() Line 5931
at d:\devel\haxelib-dev\hxcpp\src\hx\gc\immix.cpp(5931)
CharmFarmGd.exe!LocalAllocator::CallAlloc(int inSize, unsigned int inObjectFlags) Line 6058
at d:\devel\haxelib-dev\hxcpp\src\hx\gc\immix.cpp(6058)
CharmFarmGd.exe!hx::ImmixAllocator::alloc(hx::ImmixAllocator * alloc, unsigned int inSize, bool inContainer, const char * inName) Line 372
at d:\devel\haxelib-dev\hxcpp\include\hx\gc.h(372)
CharmFarmGd.exe!hx::Object::operator new(unsigned int inSize, bool inContainer, const char * inName) Line 170
at d:\devel\haxelib-dev\hxcpp\include\hx\object.h(170)
CharmFarmGd.exe!_hx_std_socket_new(bool udp, bool ipv6) Line 180
at d:\devel\haxelib-dev\hxcpp\src\hx\libs\std\socket.cpp(180)
CharmFarmGd.exe!sys::net::Socket_obj::init() Line 76
at d:\devel\charm-mobile-client\bin_test\windows\haxe\cpp\src\sys\net\socket.cpp(76)
CharmFarmGd.exe!sys::net::Socket_obj::__construct() Line 54
at d:\devel\charm-mobile-client\bin_test\windows\haxe\cpp\src\sys\net\socket.cpp(54)
CharmFarmGd.exe!sys::net::Socket_obj::__alloc(hx::ImmixAllocator * _hx_ctx) Line 227
at d:\devel\charm-mobile-client\bin_test\windows\haxe\cpp\src\sys\net\socket.cpp(227)
CharmFarmGd.exe!sys::Http_obj::customRequest(bool post, hx::ObjectPtr<haxe::io::Output_obj> api, hx::ObjectPtr<sys::net::Socket_obj> sock, String method) Line 165
at d:\devel\charm-mobile-client\bin_test\windows\haxe\cpp\src\sys\http.cpp(165)
@hughsando, I've save minidump and can send to you if it can help. Btw, can U clear to me - is it safe to alocate something( strings, for example ) in one thread, and then use it in another? Don't get - how GC track this things..
So the thread usage looks fine. This looks like your socket thread has called "new" which has triggered a collection. The collection pauses the haxe threads (who will enter a "wait" state) while the collection happens, There are 2 (or more) threads calling "MarkMember". This is ok, since these threads are the GC threads started in SThreadLoop and system is designed to work "lock free" as much as possible while marking and will go dormant once the haxe threads are restarted. It is generally ok to allocate in one thread and use in another - as long as the ownership is clear. String setting is not atomic since there are pointer and length components , so it is possible one thread may see the pointer before the length is set and vice-versa if you are randomly reading a member or static variable. The crash here is happening where a StringData (the Dynamic version of a String) tries to mark its member, but the pointer is messed up. This is either a GC marking failure - probably missing the owner of the StringData somehow, or a some unsafe thread setting of an object (Map or Array). The mini-dump might not show that much since I really need to go back in time a bit to see where it initially goes wrong. The two things I would look at are:
nme.app.Application.runOnMainThread
to pass a local result to the global state.HXCPP_GC_DEBUG_LEVEL=1
or uncomment the define in Immix.cpp
This will cause the marking to happen in single-threaded mode, and the crash stack will show the total ownership chain of the offending object, which will give a good clue about which component is messed-upYou can also try running with the haxedefs HXCPP_CHECK_POINTER
and HXCPP_GC_CHECK_POINTER
. This will add runtime checks that may pickup the problem early.
You can also try uncommenting define HX_GC_VERIFY
in Immix.cpp. This may pinpoint a GC failure as soon as it happens, but may also mask the bug so it is a bit of a double-edged sword. But if you get a crash stack with this on, then it would give good information.
So, if you are hunting this bug, I would first try HXCPP_GC_CHECK_POINTER
and HXCPP_GC_DEBUG_LEVEL=1
and see if you can get a callstack. If this does not work, you can uncomment HX_GC_VERIFY.
So, if you are hunting this bug, I would first try HXCPP_GC_CHECK_POINTER and HXCPP_GC_DEBUG_LEVEL=1
Awesome advice, thank U very much! Seems I found the promlematic code:
I had the map, where I store loader.url => loader
:
var loadingLoaders:Map<String, Loader>;
Which I count from main haxe thread like this: Lambda.count(loadingLoaders)
But at the same moment Loader
was doing his loading job in worker thread.
Now I've done like this var loadingLoaders:Map<String, Int>;
(loader.url => loader.id
) and can't reproduce crash anymore.
But one moment stays unclear to me - can I store loader.url
as a map key in main thread and safely use it (I meen loader.url
) in the worker thread (for e.g. doing haxe.Http.request(loader.url)
)? Won't it lead to some kind of race condition?
You can use a string as soon as it is created because they are immutable - but you still need to take care with maps. If you think of them like arrays where when they change size (add or remove item), internally they may create a copy of themselves and switch their state over to the copy. If the "set" happens after the copy is made, but before the state gets set, things will break. There are 2 ways to always get this right: 1 use a mutex to guard and accesses to the maps, or 2 only ever read or write the map from one thread. Say the main thread creates a loader for a given resource URI. The main thread is the only one that accesses "loadingLoaders", so it creates a Loader object, sets it in the map and then calls "loader.startThread". The loader thread does its thing and then finishes and want to remove itself from the map. You could combine this with the delivery of the result like:
function startThread() {
Thread.create( () -> {
var http = new Http(uri);
...
var result = http.data.
Application.runOnMainThread( () -> {
// Absolutely fine to do this here...
loaderLoder.remove(uri);
// and this too...
loder.onLoad(result);
});
});
}
Please excuse me - it definetly relates to my pull request. https://github.com/haxenme/nme/pull/679 Crash is quite rare - ocured only once in 3 days for now ( it's small rollout on ~1000 active users). Can anyone help me to figure out how it can be possible (crash and generated code below)