nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
409 stars 74 forks source link

Is it possible to use medaka in offline mode? #496

Closed malvaradol closed 4 months ago

malvaradol commented 6 months ago

Hi!

I'm currently trying to run medaka on a HPC server with an LSF grid, however, the computing nodes don't have internet access so when I try to run the program I have an issue with the model as it is not downloaded and not able to be downloaded. I tried to download the model directly from the files, but I'm not too sure what path from the downloaded file should I give the program in order to get it running. The model that I'm trying to run is r941_min_sup_g507, and the file that I downloaded was https://github.com/nanoporetech/medaka/blob/master/medaka/data/r941_min_sup_g507_model.tar.gz.

Any help on how to get medaka running in offline mode will be appreciated.

cjw85 commented 6 months ago

Medaka caches models that it downloads in your home directory. So if your HPC nodes mount at HOME directory the same as a computer where you do have internet access, just run medaka there first.

Failing that it's possible to simply give the tar.gz as the model argument on the command-line.

malvaradol commented 6 months ago

First one didn't work, just FYI if it helps I installed the program through pip in a conda environment.

Regarding the second one, I did provide the tar.gz as the model argument including the whole path, yet I still get an error. Here's the code line:

medaka_consensus -i ON_reads -d flye_assembly.fasta -o output_medaka -t 64 -m /model/r941_min_sup_g507_model.tar.gz

/model/r941_min_sup_g507_model.tar.gz is at the same level with the lsf file that contains the previous code line.

cjw85 commented 6 months ago

Could you please show the error you get while running the above command.

malvaradol commented 6 months ago

Here's the output for the command:

Cannot import pyabpoa, some features may not be available.
Cannot import pyabpoa, some features may not be available.
Cannot import pyabpoa, some features may not be available.
Failed to interpret '/model/r941_min_sup_g507_model.tar.gz' as a basecaller model.
Traceback (most recent call last):
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/site-packages/medaka/medaka.py", line 36, in __call__
    model_fp = medaka.models.resolve_model(val)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/site-packages/medaka/models.py", line 46, in resolve_model
    raise ValueError(
ValueError: Model /model/r941_min_sup_g507_model.tar.gz is not a known model or existant file.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/bin/medaka", line 8, in <module>
    sys.exit(main())
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/site-packages/medaka/medaka.py", line 801, in main
    args = parser.parse_args()
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1825, in parse_args
    args, argv = self.parse_known_args(args, namespace)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1858, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 2049, in _parse_known_args
    positionals_end_index = consume_positionals(start_index)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 2026, in consume_positionals
    take_action(action, args)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1935, in take_action
    action(self, namespace, argument_values, option_string)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1214, in __call__
    subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1858, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 2049, in _parse_known_args
    positionals_end_index = consume_positionals(start_index)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 2026, in consume_positionals
    take_action(action, args)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1935, in take_action
    action(self, namespace, argument_values, option_string)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1214, in __call__
    subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1858, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 2067, in _parse_known_args
    start_index = consume_optional(start_index)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 2007, in consume_optional
    take_action(action, args, option_string)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/argparse.py", line 1935, in take_action
    action(self, namespace, argument_values, option_string)
  File "/hpc/users/hernad36/.conda/envs/triatomine_genomics_v2/lib/python3.9/site-packages/medaka/medaka.py", line 39, in __call__
    raise RuntimeError(msg.format(self.dest, str(e)))
RuntimeError: Error validating model from '--model' argument: Model /model/r941_min_sup_g507_model.tar.gz is not a known model or existant file..
cjw85 commented 6 months ago

Are you entirely sure that /model/r941_min_sup_g507_model.tar.gz is the path where you have saved the model file, that it is readable by your user, and is not a broken symbolic link? The error:

ValueError: Model /model/r941_min_sup_g507_model.tar.gz is not a known model or existant file.

suggests at least one of these is not true.

malvaradol commented 6 months ago

I was able to finally get it running, and you were correct, my mistake was not providing the absolute path but a relative one, that did the trick. Now I want to take advantage of the issue to seek help with a new error I got after the program runned for a couple of hours, here's the final lines of the error output:

File "/sc/arion/projects/MML/conda/envs/polishing_tools/bin/medaka", line 11, in <module>
    sys.exit(main())
  File "/sc/arion/projects/MML/conda/envs/polishing_tools/lib/python3.10/site-packages/medaka/medaka.py", line 814, in main
    args.func(args)
  File "/sc/arion/projects/MML/conda/envs/polishing_tools/lib/python3.10/site-packages/medaka/prediction.py", line 188, in predict
    model = model_store.load_model(time_steps=None)
  File "/sc/arion/projects/MML/conda/envs/polishing_tools/lib/python3.10/site-packages/medaka/datastore.py", line 199, in load_model
    self.model.load_weights(weights).expect_partial()
  File "/sc/arion/projects/MML/conda/envs/polishing_tools/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/sc/arion/projects/MML/conda/envs/polishing_tools/lib/python3.10/site-packages/tensorflow/python/training/py_checkpoint_reader.py", line 31, in error_translator
    raise errors_impl.NotFoundError(None, None, error_message)
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /tmp/tmpqde07q4d/model/variables/variables

I spent some time looking on blogs but could not find anything useful.

Thanks for your help with all this stuff :)

cjw85 commented 6 months ago

This seems like the model is not being upacked correctly at runtime, or is not a valid model tar.gz.

Can you trying untarring the model file you have outside of medaka and report the contents?

malvaradol commented 6 months ago

So this is what I got when decompressing the file:

tar -xvzf r941_min_sup_g507_model.tar.gz
model/
model/variables/
model/variables/variables.data-00001-of-00002
model/variables/variables.index
model/variables/variables.data-00000-of-00002
model/meta.pkl
model/assets/
model/saved_model.pb
cjw85 commented 6 months ago

That seems correct. I asked because:

tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /tmp/tmpqde07q4d/model/variables/variables

suggests something was currupt about the tar.gz. At this point I'm at a loss as to what has happened. The process here is that medaka sees that you have provided a tar.gz file, and unpacks it in a temporary location on your system in order for tensorflow to read. That location is determined by Python, not by code in medaka.

I would talk to your HPC admins and ask if they know why files in /tmp appear to not be readable.

malvaradol commented 6 months ago

So far the only thing that has worked is to run medaka on a login node, but of course I can't just run the whole job in that node. My question is, if I run medaka in the login node, cancel the job and then re-send it again on a computing node, is the model stored somewhere so that it will run normally? If so, how long should I run medaka in the login node before reaching the stage where the model is saved on the system?

Just some ideas I guess, HPC admins take forever to reach back...

cjw85 commented 6 months ago

My question is, if I run medaka in the login node, cancel the job and then re-send it again on a computing node, is the model stored somewhere so that it will run normally?

The model when downloaded is always stored as a tar.gz. It isn't cached as the expanded archived -- the untarring always happens at runtime. That is to say even if you cache the model by running the program once (as suggested in this comment), the effect is no different from what you are doing by providing the tar on the command-line.

I'd really like to understand why the unpacking of the tar into /tmp is apparently going awry.