nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
481 stars 59 forks source link

Segmentation Fault for Dorado Correct #989

Open tbenavi1 opened 4 weeks ago

tbenavi1 commented 4 weeks ago

Issue Report

Please describe the issue:

Hello, I received a segmentation fault error when running dorado correct (v0.7.3). The data I am trying to correct is duplex basecalled reads. I have attached the log below. I do see "Read not found", "Read qual not found" and "tlen from before 198250 and tlen from after 0 don't match for" errors. However, I ran dorado correct in the same manner for another sample and received similar errors even though the process finished successfully with no segmentation fault. So, I am not sure if those errors are related to or separate from the segmentation fault error.

Run environment:

Logs

[2024-08-13 16:53:29.260] [info] Running: "correct" "/scratch/rranallo-benavidez/longread_results/IMR90/fasta/IMR90.all.ONT.fastq"
[2024-08-13 16:53:29.304] [warning] Unknown certs location for current distribution. If you hit download issues, use the envvar `SSL_CERT_FILE` to specify the location manually.
[2024-08-13 16:53:29.401] [info]  - downloading herro-v1 with httplib
[2024-08-13 16:53:29.418] [error] Failed to download herro-v1: SSL server verification failed
[2024-08-13 16:53:29.418] [info]  - downloading herro-v1 with curl
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
^M  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0^M  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0^M 11 22.3M   11 2644k    0     0  1649k      0  0:00:13  0:00:01  0:00:12 1649k^M100 22.3M  100 22.3M    0     0  9842k      0  0:00:02  0:00:02 --:--:-- 9844k
[2024-08-13 16:54:06.089] [info] > Using batch size 12 on device cuda:0 in inference thread 0.
[2024-08-13 16:54:06.090] [info] > Using batch size 12 on device cuda:0 in inference thread 1.
[2024-08-13 16:54:26.226] [info] > Using batch size 12 on device cuda:1 in inference thread 0.
[2024-08-13 16:54:26.226] [info] > Using batch size 12 on device cuda:1 in inference thread 1.
[2024-08-13 17:02:22.514] [info] > starting correction
[W::fai_get_val] Reference  not found in FASTA file, returning empty sequence
[2024-08-13 20:40:46.308] [error] Read  not found
[W::fai_get_val] Reference  not found in FASTA file, returning empty sequence
[2024-08-13 20:40:46.308] [error] Read qual  not found
[2024-08-13 20:40:46.308] [error] tlen from before 198250 and tlen from after 0 don't match for
/usr/bin/bash: line 1: 3366638 Segmentation fault      (core dumped) dorado correct /scratch/rranallo-benavidez/longread_results/IMR90/fasta/IMR90.all.ONT.fastq > /scratch/rranallo-benavidez/longread_results/IMR90/fasta/IMR90.all.ONT.corrected.fasta
HalfPhoton commented 3 weeks ago

Hi @tbenavi1, thanks for raising this issues - We're addressing this issue and we will fix this in an upcoming release.

cblazier commented 2 weeks ago

I am getting similar errors for simplex read correction. I can also confirm that the "returning empty sequence" and "Read qual not found" do not appear to be related to the segmentation fault error, as I have corrected smaller batches of data and gotten the former type of errors without the whole job crashing. Also, I have lowered both the batch size and index size, separately and together, and it does not seem to prevent the crashing. I have 48 CPUs, 384GB RAM, and two A100 GPUs with 40GB RAM each.

[2024-08-21 17:50:23.370] [info] Running: "correct" "-m" "herro-v1" "-i" "2G" "45kcutoff_Hartley_porechopped.fastq.gz" [2024-08-21 17:50:29.293] [info] > Using batch size 12 on device cuda:0 in inference thread 0. [2024-08-21 17:50:29.293] [info] > Using batch size 12 on device cuda:0 in inference thread 1. [2024-08-21 17:50:29.608] [info] > Using batch size 12 on device cuda:1 in inference thread 0. [2024-08-21 17:50:29.608] [info] > Using batch size 12 on device cuda:1 in inference thread 1. [2024-08-21 18:32:14.866] [info] > starting correction [W::fai_get_val] Reference not found in FASTA file, returning empty sequence [2024-08-21 21:32:23.357] [error] Read not found [W::fai_get_val] Reference not found in FASTA file, returning empty sequence [2024-08-21 21:32:23.505] [error] Read qual not found [2024-08-21 21:32:23.579] [error] tlen from before 85788 and tlen from after 0 don't match for [W::fai_get_val] Reference not found in FASTA file, returning empty sequence [2024-08-22 14:02:48.447] [error] Read not found [W::fai_get_val] Reference not found in FASTA file, returning empty sequence [2024-08-22 14:02:48.621] [error] Read qual not found [2024-08-22 14:02:48.625] [error] tlen from before 172644 and tlen from after 0 don't match for /var/spool/slurmd/job11155527/slurm_script: line 24: 14630 Segmentation fault (core dumped) dorado correct -m herro-v1 -i 2G 45kcutoff_Hartley_porechopped.fastq.gz > 45kcutoff_Hartley_porechopped.fasta

is01 commented 2 hours ago

I got the same error(Read not found/Read qual not found), but there was no segmentation error. Is this error only occurring in some of the reads, and is most of the data being processed normally? Could you please let me know if there are any problems with using the output FASTA for further analysis?