Closed JeremyQuo closed 5 months ago
Hello
This seems to be a bug. Are you seeing this on multiple input files or is it on this particular set of files? Would you be able to share the inputs files (are they very large?) so that I can try to reproduce the bug on my side?
Thanks.
I am using one input blow5 file. The size of blow5 is about more than 120Gb, which is hard to upload. But I am trying to find the bug read
But what's interesting is that when I separately extract the read that I think has a problem (after the last line of PAF) and add it as a new FASTQ file and BAM file, it can be successfully processed by eventalign. I am still checking it
You can use samtools view file.bam chr:1-1000 like syntax to extract the region of the genome for which the problem may have occured. Then you can try running on that small BAM file and see if the error can be reproduced. If you locate the region in the BAM file that causes the error, then you can use samtools faidx and slow5tools get to extract the necessary reads for this region.
Thanks for your suggestion. Now I subsample the problem reads(include blow5/fastq/bam files) And it will fail when I run the cmd
f5c eventalign -r subset.fastq -g GCF_000005845.2_ASM584v2_genomic.fna --slow5 subset.blow5 --pore rna004 -b subset.bam -c --min-mapq 0 -t 64 --rna -o subset.paf
But it (120Mb) is larger than the file size of github(25Mb).
Here is the dropbox link: https://www.dropbox.com/scl/fi/lmm52pgyjurma2h9115vv/subset.tar.gz?rlkey=lafiyapczhm4u30sunms4b20v&st=uuf0s6kx&dl=0
And it should be supposed to watch and download.
Thank you very much. I could reproduce the issue. It seems to be the following read which seems to be very weird. Do you think it is a real read? 69107f00-0265-4b7c-9f57-35e8112eb17d.fastq.txt
Neverthless, I fixed the potential bug in the latest dev branch. See if it works for you now?
So, is it the issue of ultra-long reads? And will f5c work after I remove these reads?
NO, it is not an issue with ultra-long reads. It is a very weird repetitive read that messed up my assumption of the maximum memory for the banded alignment. This is the first case I saw out of billions of reads processed over the years. Now I have updated to handle these cases, (hopefully).
OK. I will try it, many thanks.
Thanks for your response. Now the dev branch works!
Thats for finding this rare bug 👍
When I run f5c cmds as below,
f5c eventalign -r wt2.fastq -g GCF_000005845.2_ASM584v2_genomic.fna --slow5 wt1.blow5 --pore rna004 -b wt2.bam -c --min-mapq 0 -t 64 --rna -o wt2.paf
My server return the problem
And the log from f5c
It seems there is a memory problem. I re-installed f5c and changed to another server, but I still can not avoid the problem. Is there any solution?