lbcb-sci / herro

HERRO is a highly-accurate, haplotype-aware, deep-learning tool for error correction of Nanopore R10.4.1 or R9.4.1 reads (read length of >= 10 kbps is recommended).
Other
194 stars 11 forks source link

RUST_BACKTRACE=1 #47

Open fangdongming opened 3 months ago

fangdongming commented 3 months ago

Hi, i get the error when running HERRO, and i check the fastq file, but there no appear to be non-canonical bases (A, T, C, G), Could you help me about how to troubleshoot the error? thank you.

singularity run --nv herro.sif inference --read-alns batches_of_alignments -t 5 -d 0,1 -m model_R9_v0.1.pt -b 32 cyc.cor.fastq.gz cyc.cor.herro.fasta

[W graph_fuser.cpp:108] Warning: operator() profile_node %1243 : int[] = prim::profile_ivalue(%1241)
 does not have profile information (function operator())
thread '<unnamed>' panicked at /herro/src/haec_io.rs:144:9:
Out of bounds for 2-bit sequence decoding.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fangdongming commented 3 months ago

Additionally, in the result file “cyc.cor.herro.fasta”, there was already a few partial results, but once the core.* file was generated, and no further output.

dominikstanojevic commented 3 months ago

Hi,

Are you sure that generated batches (used by --read-alns) are generated with the same reads (filtered reads)? Getting this error at this stage would suggest that there tlen/qlen in *.oec.zst is different than the real length.

I will try to rewrite the code to show this error while parsing.

Best regards, Dominik Stanojevic

fangdongming commented 3 months ago

hi yes, i confirmed that it's the same input file as follow: seqkit seq -n -i cyc.cor.fastq.gz > cyc.cor.fastq.ids sh create_batched_alignments.sh cyc.cor.fastq.gz cyc.cor.fastq.ids 12 batches_of_alignments

The issue could arise from insufficient memory for the task or an excessive number of threads?

thank you.

dominikstanojevic commented 3 months ago

Hi,

can you please build the tool from this branch (https://github.com/lbcb-sci/herro/tree/decoding-debug) and run it on your data? I added some checks for the length consistency so we should get a more descriptive error message now.

Your feedback would help us to improve this tool. If we get the output we expect, we will push the changes into the main branch.

Best, Dominik