yt-project / yt

Main yt repository
http://yt-project.org
Other
454 stars 272 forks source link

BUG: fix ParamFileStore for on-demand frontend imports #4926

Open chrishavlin opened 3 weeks ago

chrishavlin commented 3 weeks ago

Currently, the Dataset pickling is broken between python sessions when using the yt config option store_parameter_files = true. For example, given two scripts, create_a_pickle.py and load_a_pickle.py:

# create_a_pickle.py
import yt
import pickle

if __name__ == '__main__':

    fname = 'IsolatedGalaxy/galaxy0030/galaxy0030'
    ds = yt.load(fname)

    with open('yt_ds.pickle', 'wb') as handle:
        pickle.dump(ds, handle, protocol=pickle.HIGHEST_PROTOCOL)
# load_a_pickle.py
import pickle

if __name__ == '__main__':

    with open('yt_ds.pickle', 'rb') as handle:
        ds = pickle.load(handle)
   _ =  ds.index

The following fails on the attempt to load:

$ python create_a_pickle.py
$ python load_a_pickle.py
Traceback (most recent call last):
  File "/home/chavlin/src/yt_general/performance/yt_perf_testing/miscelleaneous_tests/dask_pickles_etc/pickle_from_file.py", line 5, in <module>
    ds = pickle.load(handle)
  File "/home/chavlin/src/yt_general/yt/yt/data_objects/static_output.py", line 2053, in _reconstruct_ds
    ds = datasets.get_ds_hash(*args)
  File "/home/chavlin/src/yt_general/yt/yt/utilities/parameter_file_storage.py", line 91, in get_ds_hash
    return self._convert_ds(self._records[hash])
  File "/home/chavlin/src/yt_general/yt/yt/utilities/parameter_file_storage.py", line 117, in _convert_ds
    raise UnknownDatasetType(class_name)
yt.utilities.parameter_file_storage.UnknownDatasetType: EnzoDataset

(I also ran into this problem when experimenting with some dask functionality...).

The issue comes how yt now imports frontends on demand (output_type_registry is reset between sessions).

chrishavlin commented 3 weeks ago

Setting to draft for now because I want to see if I can write a test with some pytest monkeypatching.

Also note that the fix here is not backwards compatible -- IMO it's fine because I don't think many folks are using this functionality and it's not meant for long-term dataset storage anyway.

chrishavlin commented 1 week ago

assuming tests pass this should be ready for review

chrishavlin commented 1 week ago

oh, those are real failures. will take another go at those pytest fixtures....