pyrddlgym-project / pyRDDLGym

A toolkit for auto-generation of OpenAI Gym environments from RDDL description files.
https://pyrddlgym.readthedocs.io/
MIT License
68 stars 17 forks source link

RDDLSimServer data.json file truncated #254

Open GMMDMDIDEMS opened 8 months ago

GMMDMDIDEMS commented 8 months ago

With large instances (many objects and states) and correspondingly large data.json files, it very often happens that the data.json files are cut off, i.e. a part is missing and is not formatted correctly.

I cannot exactly identify the cause of the problem, but it is not due to the implementation. I can rule out that it is due to a lack of disk space, and it shouldn't be due to memory either, as a Docker container has no resource constraints by default. No error is thrown if a file is saved incorrectly formatted/truncated and it is also not possible to say that a file is no longer written correctly above a certain size. Some files with 13MB were written correctly and some whose correct size is 8MB were only written up to 2.4MB.

However, switching from json to orjson, a faster and more memory-efficient alternative, eliminates all problems.

Fix

https://github.com/pyrddlgym-project/pyRDDLGym/blob/ad21a92772035017785fc044014cb45596eaea0b/pyRDDLGym/core/server.py#L154

with

import orjson

def dump_data(self, fn):
   """Dumps the data to a json file."""
   json_content = orjson.dumps(self.logs)
   with open(fn, mode="wb") as f:
      f.write(json_content)
mike-gimelfarb commented 8 months ago

Hmm so it's not us. We noticed this a while ago when running the code on a compute server, and thought it was something on the server's end. Now you've pointed out that it is basically the json package at fault. We'll roll out your solution soon. Alternatively if you wish to make a PR, you are welcome to do so. The extra analysis and debugging is also greatly appreciated, thanks!

GMMDMDIDEMS commented 7 months ago

I have tested a little further and there are also a few cases with orjson where the json file is truncated. Currently I suspect that it is due to peak memory consumption when the server.logs dict is serialised to a JSON formatted stream.

When I have some more time I will test further to verify the actual cause of the problem. Most likely I won't get a chance to do so in the next week.

mike-gimelfarb commented 7 months ago

Might it have something to do with a limit on the server where the code is running? Maybe we should test to see if incrementally writing to a json file from the dict in append mode key-by-key also causes the same problem? Alternatively, we could consider switching to csv or, even better, use the built in CSV logger we already have in pyrddlgym instead of json. In any case, thanks again for all your help in debugging, it is really appreciated.