nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

failed to get modbase info ... skipped: AUX data not found #671

Closed vetmohit89 closed 5 months ago

vetmohit89 commented 6 months ago

Hello, I have generated RNA004 in fast5 format and converted them to pod5. I am using following commands to create bam file:

dorado basecaller sup ./input_pod5_files/ --modified-bases m6A_DRACH --reference /reference_genome/ > ./test_m6a_3.bam

But this bam file is missing MM/ML/MN tags. when I using this bam file for m6a calling using modkit, I am getting error: failed to get modbase info for record fc798da5-16eb-4fa4-9449-79013e3cbde6, Skipped: AUX data not found

Earlier, I thought maybe this issue maybe with modkit and shared a test datasample with @Art Rand in box (https://uab.box.com/s/g3g3nlg53jko2xqtwv89rzrnkkh2v6x2), He is able to generate the correct bam file with MM/ML/MN tags , followed by modkit.

I am using HPC. (Not a personal computer)

I am not sure if I am missing something in my dorado command? or is it because of any other reason. Please help me troubleshoot this error. I already create a similar issue on modkit issue. https://github.com/nanoporetech/modkit/issues/134

ethan-mcq commented 6 months ago

Your previous issue, ArtRand specified that you are not correctly inputting the model. In your command you need to specify 'rna004_130bps_sup@v3.0.1', not 'sup'. So the command should be

dorado basecaller rna004_130bps_sup@v3.0.1 ./input_pod5_files/ --modified-bases m6A_DRACH --reference /reference_genome/ > ./test_m6a_3.bam
vetmohit89 commented 6 months ago

When I tried above command it is giving error: terminate called after throwing an instance of 'std::runtime_error' what(): unknown simplex model rna004_130bps_sup@v3.0.1 Aborted

But When I tried dorado basecaller sup ./input_pod5_files/ --modified-bases m6A_DRACH --reference /reference_genome/ > ./test_m6a_3.bam

[2024-03-04 17:53:26.269] [warning] Unknown certs location for current distribution. If you hit download issues, use the envvar `SSL_CERT_FILE` to specify the location manually.
[2024-03-04 17:53:26.272] [info]  - downloading rna004_130bps_sup@v3.0.1 with httplib
[2024-03-04 17:53:26.338] [error] Failed to download rna004_130bps_sup@v3.0.1: SSL server verification failed
[2024-03-04 17:53:26.338] [info]  - downloading rna004_130bps_sup@v3.0.1 with curl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 60.9M  100 60.9M    0     0   262M      0 --:--:-- --:--:-- --:--:--  262M
[2024-03-04 17:53:27.647] [info] > Creating basecall pipeline
[2024-03-04 17:53:27.650] [info]  - BAM format does not support `U`, so RNA output files will include `T` instead of `U` for all file types.
[2024-03-04 17:53:39.782] [info]  - set batch size for cuda:0 to 1728
[                              ] 0% [00m:00s<00m:00s] 
[2024-03-04 17:54:15.898] [info] > Simplex reads basecalled: 17545
[2024-03-04 17:54:15.898] [info] > Simplex reads filtered: 46
[2024-03-04 17:54:15.898] [info] > Basecalled @ Samples/s: 1.581510e+07
[2024-03-04 17:54:15.929] [info] > Finished`
HalfPhoton commented 6 months ago

Hi @vetmohit89, Can you try both of the following please?

# Try using the full auto-model complex
dorado basecaller sup,m6A_DRACH input_pod5_files/ --reference /reference_genome/ > ./test_m6a_3.ba

Downloading specific models:

# Download the rna004 model
dorado download --model rna004_130bps_sup@v3.0.1
# Download the rna004 m6A_DRACH mods model 
dorado download --model rna004_130bps_sup@v3.0.1_m6A_DRACH@v1
# call dorado with the specific model paths (note additional -models suffix   |here | )
dorado basecaller rna004_130bps_sup@v3.0.1/ input_pod5_files/ --modified-bases-models  rna004_130bps_sup@v3.0.1_m6A_DRACH@v1 --reference /reference_genome/ > ./test_m6a_3.bam

Kind regards, Rich

vetmohit89 commented 6 months ago

Hello Rich,

Following command works for me noe: dorado basecaller rna004_130bps_sup@v3.0.1 ./pod5/test/ \ --modified-bases m6A_DRACH \ --reference ./reference_genome_files/reference/ > ./test/test_m6a_3.bam.

I am wondering what other RNA modification I can identify with dorado?

Thank you Mohit

HalfPhoton commented 6 months ago

Hi @vetmohit89, We've found the underlying issue in your original question where there were no mods in your output. Mixing the model complex hac and the --modified-bases was incorrectly setting modification models in the pipeline.

Using either a complete model complex sup,m6A_DRACH or a model path and --modified-bases will work.

This will be fixed in the next release.

To view what modification models are available run: dorado download --list

Kind regards, Rich