Closed malvaradol closed 4 months ago
Medaka caches models that it downloads in your home directory. So if your HPC nodes mount at HOME directory the same as a computer where you do have internet access, just run medaka there first.
Failing that it's possible to simply give the tar.gz as the model argument on the command-line.
First one didn't work, just FYI if it helps I installed the program through pip in a conda environment.
Regarding the second one, I did provide the tar.gz as the model argument including the whole path, yet I still get an error. Here's the code line:
medaka_consensus -i ON_reads -d flye_assembly.fasta -o output_medaka -t 64 -m /model/r941_min_sup_g507_model.tar.gz
/model/r941_min_sup_g507_model.tar.gz
is at the same level with the lsf file that contains the previous code line.
Could you please show the error you get while running the above command.
Here's the output for the command:
Cannot import pyabpoa, some features may not be available.
Cannot import pyabpoa, some features may not be available.
Cannot import pyabpoa, some features may not be available.
Failed to interpret '/model/r941_min_sup_g507_model.tar.gz' as a basecaller model.
Traceback (most recent call last):
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/site-packages/medaka/medaka.py", line 36, in __call__
model_fp = medaka.models.resolve_model(val)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/site-packages/medaka/models.py", line 46, in resolve_model
raise ValueError(
ValueError: Model /model/r941_min_sup_g507_model.tar.gz is not a known model or existant file.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/bin/medaka", line 8, in <module>
sys.exit(main())
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/site-packages/medaka/medaka.py", line 801, in main
args = parser.parse_args()
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1825, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1858, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 2049, in _parse_known_args
positionals_end_index = consume_positionals(start_index)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 2026, in consume_positionals
take_action(action, args)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1935, in take_action
action(self, namespace, argument_values, option_string)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1214, in __call__
subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1858, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 2049, in _parse_known_args
positionals_end_index = consume_positionals(start_index)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 2026, in consume_positionals
take_action(action, args)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1935, in take_action
action(self, namespace, argument_values, option_string)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1214, in __call__
subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1858, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 2067, in _parse_known_args
start_index = consume_optional(start_index)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 2007, in consume_optional
take_action(action, args, option_string)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1935, in take_action
action(self, namespace, argument_values, option_string)
File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/site-packages/medaka/medaka.py", line 39, in __call__
raise RuntimeError(msg.format(self.dest, str(e)))
RuntimeError: Error validating model from '--model' argument: Model /model/r941_min_sup_g507_model.tar.gz is not a known model or existant file..
Are you entirely sure that /model/r941_min_sup_g507_model.tar.gz
is the path where you have saved the model file, that it is readable by your user, and is not a broken symbolic link? The error:
ValueError: Model /model/r941_min_sup_g507_model.tar.gz is not a known model or existant file.
suggests at least one of these is not true.
I was able to finally get it running, and you were correct, my mistake was not providing the absolute path but a relative one, that did the trick. Now I want to take advantage of the issue to seek help with a new error I got after the program runned for a couple of hours, here's the final lines of the error output:
File "/sc/arion/projects/MML/conda/envs/polishing_tools/bin/medaka", line 11, in <module>
sys.exit(main())
File "/sc/arion/projects/MML/conda/envs/polishing_tools/lib/python3.10/site-packages/medaka/medaka.py", line 814, in main
args.func(args)
File "/sc/arion/projects/MML/conda/envs/polishing_tools/lib/python3.10/site-packages/medaka/prediction.py", line 188, in predict
model = model_store.load_model(time_steps=None)
File "/sc/arion/projects/MML/conda/envs/polishing_tools/lib/python3.10/site-packages/medaka/datastore.py", line 199, in load_model
self.model.load_weights(weights).expect_partial()
File "/sc/arion/projects/MML/conda/envs/polishing_tools/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/sc/arion/projects/MML/conda/envs/polishing_tools/lib/python3.10/site-packages/tensorflow/python/training/py_checkpoint_reader.py", line 31, in error_translator
raise errors_impl.NotFoundError(None, None, error_message)
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /tmp/tmpqde07q4d/model/variables/variables
I spent some time looking on blogs but could not find anything useful.
Thanks for your help with all this stuff :)
This seems like the model is not being upacked correctly at runtime, or is not a valid model tar.gz.
Can you trying untarring the model file you have outside of medaka
and report the contents?
So this is what I got when decompressing the file:
tar -xvzf r941_min_sup_g507_model.tar.gz
model/
model/variables/
model/variables/variables.data-00001-of-00002
model/variables/variables.index
model/variables/variables.data-00000-of-00002
model/meta.pkl
model/assets/
model/saved_model.pb
That seems correct. I asked because:
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /tmp/tmpqde07q4d/model/variables/variables
suggests something was currupt about the tar.gz. At this point I'm at a loss as to what has happened. The process here is that medaka sees that you have provided a tar.gz file, and unpacks it in a temporary location on your system in order for tensorflow to read. That location is determined by Python, not by code in medaka.
I would talk to your HPC admins and ask if they know why files in /tmp
appear to not be readable.
So far the only thing that has worked is to run medaka on a login node, but of course I can't just run the whole job in that node. My question is, if I run medaka in the login node, cancel the job and then re-send it again on a computing node, is the model stored somewhere so that it will run normally? If so, how long should I run medaka in the login node before reaching the stage where the model is saved on the system?
Just some ideas I guess, HPC admins take forever to reach back...
My question is, if I run medaka in the login node, cancel the job and then re-send it again on a computing node, is the model stored somewhere so that it will run normally?
The model when downloaded is always stored as a tar.gz. It isn't cached as the expanded archived -- the untarring always happens at runtime. That is to say even if you cache the model by running the program once (as suggested in this comment), the effect is no different from what you are doing by providing the tar on the command-line.
I'd really like to understand why the unpacking of the tar into /tmp
is apparently going awry.
Hi!
I'm currently trying to run medaka on a HPC server with an LSF grid, however, the computing nodes don't have internet access so when I try to run the program I have an issue with the model as it is not downloaded and not able to be downloaded. I tried to download the model directly from the files, but I'm not too sure what path from the downloaded file should I give the program in order to get it running. The model that I'm trying to run is
r941_min_sup_g507
, and the file that I downloaded was https://github.com/nanoporetech/medaka/blob/master/medaka/data/r941_min_sup_g507_model.tar.gz.Any help on how to get medaka running in offline mode will be appreciated.