magenta / mt3

MT3: Multi-Task Multitrack Music Transcription
Apache License 2.0
1.41k stars 185 forks source link

Error when opening the '.zarray' file in mt3 checkpoints #32

Open kunfang98927 opened 2 years ago

kunfang98927 commented 2 years ago

I'm trying to run the inference part in Slurm hpc (system: centOS 7.3) and meet an issue. The issue occurs when the checkpoint is loading: inference_model = InferenceModel(checkpoint_path, MODEL) I tried to search solutions on google but there was few. I will appreciate it if you could give me some advice.

The following is some of my packages which may be related to this issue:

python 3.7 cuda 11.2.2 cudnn 8.1a11 tensorflow 2.8.0 flax 0.4.0 jax 0.3.4 jaxlib 0.3.2+cuda11.cudnn82 xarray 0.20.2 zarr 2.11.1

The error information:

Traceback (most recent call last): File "mt3_inference.py", line 267, in inference_model = InferenceModel('./checkpoints/mt3/', 'mt3')

File "mt3inference.py", line 114, in __init_\ self.restore_from_checkpoint(checkpoint_path)

File "mt3_inference.py", line 161, in restore_from_checkpoint [restore_checkpoint_cfg], init_rng=jax.random.PRNGKey(0))

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/utils.py", line 522, in from_checkpoint_or_scratch return (self.from_checkpoint(ckpt_cfgs, ds_iter=ds_iter, init_rng=init_rng)

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/utils.py", line 508, in from_checkpoint self.from_checkpoints(ckpt_cfgs, ds_iter=ds_iter, init_rng=init_rng))

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/utils.py", line 466, in from_checkpoints yield _restore_path(path, restore_cfg)

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/utils.py", line 458, in _restore_path fallback_state=fallback_state)

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/checkpoints.py", line 861, in restore lazy_parameters=lazy_parameters)

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/checkpoints.py", line 910, in _read_state_from_tensorstore state_dict = _run_future_tree(future_state_dict)

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/checkpoints.py", line 161, in _run_future_tree leaves = loop.run_until_complete(asyncio.gather(*future_leaves))

File "/gpfsnyu/scratch/kf2395/.cache/env/tf2-gpu-py3.7/lib/python3.7/site-packages/nest_asyncio.py", line 81, in run_until_complete return f.result()

File "/gpfsnyu/scratch/kf2395/.cache/env/tf2-gpu-py3.7/lib/python3.7/asyncio/tasks.py", line 251, in __step result = coro.throw(exc)

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/checkpoint_importer.py", line 114, in _get_and_cast arr = await self._get_fn() # pytype: disable=bad-return-type

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/checkpoints.py", line 1241, in _read_ts t = await ts.open(tmp_ts_spec_dict, open=True)

File "/gpfsnyu/scratch/kf2395/.cache/env/tf2-gpu-py3.7/lib/python3.7/asyncio/futures.py", line 263, in __await__ yield self # This tells Task to wait for completion.

File "/gpfsnyu/scratch/kf2395/.cache/env/tf2-gpu-py3.7/lib/python3.7/asyncio/tasks.py", line 318, in __wakeup future.result()

File "/gpfsnyu/scratch/kf2395/.cache/env/tf2-gpu-py3.7/lib/python3.7/asyncio/futures.py", line 181, in result raise self._exception

ValueError: Error opening "zarr" driver: Error reading local file "./checkpoints/mt3/target.decoder.layers_0.encoder_decoder_attention.key.kernel/.zarray": Invalid key: "./checkpoints/mt3/target.decoder.layers_0.encoder_decoder_attention.key.kernel/.zarray"

arshjot commented 1 year ago

I had this issue too, try using the full path (absolute path) of the checkpoint instead of the relative path