Open AlwxSin opened 1 year ago
I've been away from go and C for quite some time so ymmv. If I understand it right it only happens on a particular machine? Does the architecture of the machine differ from your dev machine? What OS ist this and your dev machine running? Do you cross compile? Which version of go are you using? Did you try a different (possibly older) one?
I think, it depends on data, rather than hardware. Because we have tested several setups (everywhere go1.21):
We do not cross compile, binary builds on same os where it runs. And we can't try older versions, because we need updates from #12
Ah, forgot to mention, that we never run production data on local or dev machines
Understandable but really hard to debug then. Maybe far fetched but when you handle heaps of data concurrently (do you?), did you play around with go's and memory settings?
Yeah, we process xml files concurrently in workers, it may affect?
play around with go's and memory settings?
No, default everywhere.
Just to make sure, you are freeing correctly and are using Init and cleanup only once?
I should. Init on startup and then in each worker I handle files, validation and cleanup.
Cleanup or Free? Cleanup should only be called when program or part of program exits ...
Cleanup on program exit and Free on workers' job done.
Can't really help immediately then I guess. Only thing I could think of is to fake huge xml requests and concurrency using go 1.21 but unfortunately I don't really have spare time currently to set this up.
Just a short notice, I tested this a little and I still have the suspicion that the cause could be go's memory management. You could try to play around with the InitWithGc function time parameter and the GOMEMLIMIT env variable. You could also check your systems ulimit settings. You could probably also try to delay worker execution and/or turn the concurrency down.
Funny thing is my mac with go 1.21 fails pretty soon when testing with concurrency of 100 and 100MB xml file ... my linux machine with go1.17 just chuckles along, with the restriction that with a concurrency of 100 it sometimes just stalls because of cpu load. I'd personally give it a lower concurrency setting, at least on my hardware around 20 performs rather well.
I will give this a spin a little later on my linux machine with go 1.21 and check if it makes a difference.
Does it fails with same stacktrace?
On mac it just fails with a killed signal without a trace, on linux it never fails.
I've got strange issue when trying to validate large number (more than 100) of large xml files (20-130mb). It looks like this
or
Stacktrace always the same
Problem is that I can't reproduce it on my machine with same files and I don't have access to server, where error occurs.
Error can happen at any time, on any file, can't reproduce it on exact one file or set of files. Only on bunch of xml's. Error can happen at second file or at 29th, no pattern.
How can I debug or reproduce error? Maybe there is a bug
C
code?