Closed jennycmuscat closed 2 weeks ago
Hi @jennycmuscat, apologies for the delay.
This is unusual - can you tell me if software/dna_r10.4.1_e8.2_400bps_hac@v5.0.0
is a symbolic link in any way?
Dorado should find models in the current working directory. So placing this model there should work.
Note: The model search behaviour is changing in a future release with the addition of the --models-directory
CLI argument. This will be described in more detail in the release note.
Kind regards, Rich
No, it is not a symbolic link. The directory software
is in my current working directory containing software I am running, including Dorado and the associated basecaller model dna_r10.4.1_e8.2_400bps_hac@v5.0.0
. I have included this directory to my PATH variable, as well as tried to run the command specified when this model is in my current working directory (not in the software
directory) but have not had any luck either way.
Is it required that dna_r10.4.1_e8.2_400bps_hac@v5.0.0
and dorado-0.7.1-linux-x64
are saved in the same directory? Or could I get more specifics regarding this - as I have tried this too but have so far not managed.
Thank you for the help!
Adding the model directory to the PATH variable will have no effect - this isn't how the model search is implemented.
No the model does not need to be in the software directory - but it will be found if it's in the current working directory.
Can you also download the dna_r10.4.1_e8.2_400bps_hac@v5.0.0_5mCG_5hmCG@v1 model so you working directory looks like:
dna_r10.4.1_e8.2_400bps_hac@v5.0.0/
dna_r10.4.1_e8.2_400bps_hac@v5.0.0_5mCG_5hmCG@v1/
pod5_pass/
sample.fa
software/dorado-0.7.1-linux-x64/dorado
running in this directory the following should work:
./software/dorado-0.7.1-linux-x64/dorado basecaller dna_r10.4.1_e8.2_400bps_hac@v5.0.0 pod5_pass/ --reference sample.fa --batchsize 64 --device cuda:0 --modified-bases 5mC_5hmC > reads.bam
I have realised that the issue I am having is only present when running the command in a Nextflow pipeline. The command you specify does indeed work on the command line - but the current working directory is no longer recognised in Nextflow. I understand that this is an issue beyond Dorado, I will keep using the full path when running the command in Nextflow. Regardless, thank you for your help.
but the current working directory is no longer recognised in Nextflow
The CWD for a nextflow process contains only what's specified in the inputs
so you'll need to make sure the model is added here so that it's staged into CWD for each job. This will be a symbolic link (by default) but that shouldn't matter.
This will be made easier in a future release with the addition of --models-directory
All the best, Rich
I am trying to run Dorado to identify 5mC methylations, but only have success when the full path to the basecaller model
dna_r10.4.1_e8.2_400bps_hac@v5.0.0
is specified (from root). The command written below results in the error "Cannot find modification model for '5mC_5hmC' reason: simplex model doesn't exist", whilst when the full path (/home/.../software/dna_r10.4.1_e8.2_400bps_hac@v5.0.0) is used, no error occurs and it runs fine.This was also attempted after saving the model
dna_r10.4.1_e8.2_400bps_hac@v5.0.0
in a location included in my system's PATH environment variable, but resulted in the same issue. The same occurs when saving the model in the same location asdorado-0.7.1-linux-x64
too. The model simply does not seem to be found when running dorado when its full path is not specified.Is there a way to not have to specify the full path to the basecaller model to run the command below?
Steps to reproduce the issue:
Running from a directory containing the software directory with the specified Dorado model:
dorado basecaller software/dna_r10.4.1_e8.2_400bps_hac@v5.0.0 pod5_pass/barcode --reference sample.fa --verbose --batchsize 64 --device cuda:0 --modified-bases 5mC_5hmC > reads.bam
Run environment:
dorado basecaller software/dna_r10.4.1_e8.2_400bps_hac@v5.0.0 pod5_pass/barcode --reference sample.fa --verbose --batchsize 64 --device cuda:0 --modified-bases 5mC_5hmC > reads.bam
Logs