Open GMMDMDIDEMS opened 8 months ago
Hmm so it's not us. We noticed this a while ago when running the code on a compute server, and thought it was something on the server's end. Now you've pointed out that it is basically the json package at fault. We'll roll out your solution soon. Alternatively if you wish to make a PR, you are welcome to do so. The extra analysis and debugging is also greatly appreciated, thanks!
I have tested a little further and there are also a few cases with orjson
where the json file is truncated. Currently I suspect that it is due to peak memory consumption when the server.logs
dict is serialised to a JSON formatted stream.
When I have some more time I will test further to verify the actual cause of the problem. Most likely I won't get a chance to do so in the next week.
Might it have something to do with a limit on the server where the code is running? Maybe we should test to see if incrementally writing to a json file from the dict in append mode key-by-key also causes the same problem? Alternatively, we could consider switching to csv or, even better, use the built in CSV logger we already have in pyrddlgym instead of json. In any case, thanks again for all your help in debugging, it is really appreciated.
With large instances (many objects and states) and correspondingly large data.json files, it very often happens that the data.json files are cut off, i.e. a part is missing and is not formatted correctly.
I cannot exactly identify the cause of the problem, but it is not due to the implementation. I can rule out that it is due to a lack of disk space, and it shouldn't be due to memory either, as a Docker container has no resource constraints by default. No error is thrown if a file is saved incorrectly formatted/truncated and it is also not possible to say that a file is no longer written correctly above a certain size. Some files with 13MB were written correctly and some whose correct size is 8MB were only written up to 2.4MB.
However, switching from
json
toorjson
, a faster and more memory-efficient alternative, eliminates all problems.Fix
https://github.com/pyrddlgym-project/pyRDDLGym/blob/ad21a92772035017785fc044014cb45596eaea0b/pyRDDLGym/core/server.py#L154
with