Open kmshort opened 1 year ago
@kmshort I'm not able to reproduce the problem using the part of the R2 file you pasted above. It's possible someone else might be able to figure out the problem by looking at the info you provided already, but I don't think I could help without more info. One thing I would suggest: try to cut the file in half (wc -l
to find the number of lines then use zcat
and head
/tail
to get halves of the files, making sure to keep the number of lines a multiple of 4). Then see if you can use this approach a few times to narrow down a small example that causes the error. If you can do this, and you don't mind sending me the data, I can try to work with it.
Also, I should mention that I tested a fresh clone, so I'll go back when I can and try with the tar of v1.2.1 and see if it produces the error on this small fragment of the data.
thanks @andrewdavidsmith I started doing what you suggested, but at the same time ran it on the entire dataset again after doing a system restart. It worked without segfaulting! So that's good, but raises questions in of itself. But for now, nothing to see here.
If anyone else has similar problems, I guess the first thing to do is "turn it off and on again". You'll be sure to hear if it starts happening again.
Oh dear, it's seg faulting again. I'll see if I can divide and conquer to find the offending sequence(s), or other.
I had the same problem. The fastq file I used is SRR14562354
@yangli04 i need version info, environment and preferably a link to part of the fastq file. You gave the SRA run accession, but it's not always enough (eg, what version of fastq-dump; wget first; fasterq-dump, etc.). If you can use head from terminal and get a file to reproduce the problem, I can work directly with that. If you can't give me a small test file, give your command line and a hash of the input files (md5) so we can try to reproduce.
First, if I get the head from the file, I cannot reproduce the problem. Only if I use the whole .fastq.gz file created directly from fastq-dump, I can reproduce this problem.
I used falco 1.2.1 and fastq-dump 3.1.0.
Second, I think it might be the problem with compressing. When I gunzip the .fastq.gz file, using falco on the decompressed .fastq file will not produce any problem. Even if I use gzip to create a fastq.gz file from the unzipped .fastq file, it do not cause this problem either.
Then I get the md5sum of the two compressed files:
It seems like the files are different.
md5 of my .sra file is ddd71a585d80515e4766f676dc7c0be1 SRR14562354.sra
@yangli04 I will be able to test this with your info. It might take some time. There's a chance the issue has been fixed in v1.2.2 because we did some updates related to the compression library since that was involved in faster processing of BAM format, which I think we incorporated between v1.2.1 and v1.2.2. So if you can tell me whether the problem is still present in v1.2.2 it might make things happen faster. That will be the first step for me in debugging when I have time for it.
@andrewdavidsmith Thank you. This problem solved magically in v1.2.2
Hi, I've compiled Falco: configure:
make:
install:
and run falco:
falco sequencing.fq.gz
and get output:
I have paired end sequences, that have gone through trim galore!
I've tested on the R1 - and falco runs fine (it's sooooo much faster than fastQC, it's amazing).
But falco crashes with a segfault on the R2 sequence. The file is a 15302780411 byte (~15.3 gig) gzipped fastq file.
The head of the original file started something like this (I passed a modified version of this which had gone through trimgalore).
It has come from an MGI instrument, but it's nothing special. Falco is happy when I pass my R1 to it.
Any ideas why this would seg fault? All three outputs (summary, txt and html) are empty files when it faults with R2. When falco processes the R1 sequence, it's fine and the output looks good.
I'm running ubuntu 20.04 if that matters.
many thanks, Kieran