Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
I'm trying to run ExperimentAnalysis(old_experiment_directory). It's failing with the following stacktrace:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/mnt/nfs/dhi_work/walter/project/project/evaluate.py", line 93, in <module>
main(args)
File "/mnt/nfs/dhi_work/walter/project/project/evaluate.py", line 69, in main
model = get_best_model(args.ray_root, device)
File "/mnt/nfs/dhi_work/walter/project/project/evaluate.py", line 22, in get_best_model
analysis = ExperimentAnalysis(directory)
File "/home/username/.cache/pypoetry/virtualenvs/project-L8WgGYaP-py3.10/lib/python3.10/site-packages/ray/tune/analysis/experiment_analysis.py", line 137, in __init__
self.trials = trials or self._load_trials()
File "/home/username/.cache/pypoetry/virtualenvs/project-L8WgGYaP-py3.10/lib/python3.10/site-packages/ray/tune/analysis/experiment_analysis.py", line 148, in _load_trials
trial = Trial.from_json_state(trial_json_state, stub=True)
File "/home/username/.cache/pypoetry/virtualenvs/project-L8WgGYaP-py3.10/lib/python3.10/site-packages/ray/tune/experiment/trial.py", line 1231, in from_json_state
state = json.loads(json_state, cls=TuneFunctionDecoder)
File "/usr/lib/python3.10/json/__init__.py", line 359, in loads
return cls(**kw).decode(s)
File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.10/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
File "/home/username/.cache/pypoetry/virtualenvs/project-L8WgGYaP-py3.10/lib/python3.10/site-packages/ray/tune/utils/serialization.py", line 39, in object_hook
return self._from_cloudpickle(obj)
File "/home/username/.cache/pypoetry/virtualenvs/project-L8WgGYaP-py3.10/lib/python3.10/site-packages/ray/tune/utils/serialization.py", line 43, in _from_cloudpickle
return cloudpickle.loads(hex_to_binary(obj["value"]))
AttributeError: type object 'pyarrow._fs.LocalFileSystem' has no attribute '_reconstruct'
Possibly related: apache/arrow#40342 ?
This was a pretty expensive experiment to run, so it would be great to be able to load it up again.
Unfortunately the issue is seemingly with the experiment files themselves, as they contain some pickled objects that can't be re-opened. I think my best bet is a workaround if someone has seen this issue before. I wouldn't expect new experiments to face the same issue.
Issue Severity
Medium: It is a significant difficulty but I can work around it.
What happened + What you expected to happen
I'm trying to run
ExperimentAnalysis(old_experiment_directory)
. It's failing with the following stacktrace:Possibly related: apache/arrow#40342 ?
This was a pretty expensive experiment to run, so it would be great to be able to load it up again.
Versions / Dependencies
Ray = 2.7.1
pyarrow = 17.0.0
Poetry lockfile at the time of running the experiment: poetry.lock.running_experiment.txt
Poetry lockfile at the time of trying to run the analysis: poetry.lock.loading_analysis.txt
Reproduction script
Unfortunately the issue is seemingly with the experiment files themselves, as they contain some pickled objects that can't be re-opened. I think my best bet is a workaround if someone has seen this issue before. I wouldn't expect new experiments to face the same issue.
Issue Severity
Medium: It is a significant difficulty but I can work around it.