Closed elenu closed 2 years ago
Hi
The error message relates to the merged.fq FASTQ file. In the FASTQ files, each entry consists of 4 lines. The second line contains the actual sequence, while the fourth line contains the quality score symbols. These two lines have to be exactly the same length, since each quality score symbol corresponds to each nucleotide symbol. Here, it seems like the number of symbols on line 28205492 (quality score symbols) is different from the number of symbols on line 28205490 (nucleotide symbols). It may be due to a truncated file or some other error.
I am not sure I understand your description, but if the merged.fq file is only 10290675 lines long there is something more seriously wrong. Could you run wc -l merged.fq
to confirm this?
The message Lengths min 43, lo_quartile 251, median 251, hi_quartile 251, max 251
relates to the length distribution in general of the sequences, indicating that almost all of them are 251 nucleotides long, but that the shortest one is just 43 nucleotides long. This is not related to the error message you got.
Hi
Thank very much you for answering all the points.
I have run the wc -l merged.fq
line, and the result is:
28205491 merged.fq
I understand this means there's only 28205491 lines and the error message is related to it has to stop due to there's no proper ending-line in the merged.fq file. Then, might this be the due to the previous usearch line? Do you think it is possible to add a proper ending-line to the merged.fq file manually? Thank you.
Sounds like that's the reason, yes.
This command should fix it:
echo >> merged.fq
Be careful to include the double >
characters, otherwise the file will become empty.
By now it returns another error message that says the file is too long.
I'm trying now sed -i '' merged.fq
.
You could run tail merged.fq
to see what the end of the file looks like.
Good point. It has returned this outcome:
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@113.7051372;ee=0.029;
GTGTCAGCCGCCGCGGTAATACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGCGAGTTAAGTCAGCGGTAAAAGCCCGGGGCTCAACCCCGGCCCGCCGTTGAAACTGGCTGGCTTGAGTTGGGGAAAGGCAGGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACCCCGATTGCGAAGGCAGCCTGCCGGCCCCACACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACAGGATTAGAAACCCTAGTAGTCC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@113.7051373;ee=0.032;
GTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGATTTACTGGGTGTAAAGGGAGCGTAGACGGTGAAGTAAGTCTGGAGTGAAAGGCGGGGGCCCAACCCCCGGACTGCTCTGGAAACTATTTGACTGGAGTGCAGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGACTGTAACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCGTGTAGTCC
+
FFFFFFFFFFFFFFFFFFFFFFFFF
It seems the quality score symbols have suddenly stop in the last case.
Yes, seems like the file is truncated.
Perhaps you need to rerun the previous step.
I totally agree.
I observe the @113 annotation corresponds to the last sample that has been processed (15 samples out of 160 in total).
I'll focus on the previous step, thus.
Thank you for all the help!
regression tests based on these comments were added to the vsearch
test suite https://github.com/frederic-mahe/vsearch-tests/commit/e1a4de16cd5d185db0beebb3f750ffb99de6bd45
Hello everybody,
Hope you are doing well. I was wondering if you could help me with a "fatal error" message that I get after running the vsearch command on amplicon data:
vsearch --fastq_chars merged.fq
. The message that it returns is:The merged.file has been obtained from running the usearch code:
usearch -fastq_mergepairs *R1_001.fastq -reverse *R2_001.fastq -fastq_eeout -fastq_maxdiffs 10 -fastq_maxmergelen 300 -fastqout merged.fq -relabel @ -report merged.txt
I guess the issue might be with the vsearch step because I checked the lines, and it was empty from the 10290675 line. When I saved another file with the content, except the continuation from that line+1, I got another error message mentioning unexpected end of file.
We had a bioinformatician that used this exact code a while ago, and it worked for him. I tried the option to ignore the message, but at the end of the process, I only obtain data corresponding to 15 samples. Thus, I decided to check line by line of the code, instead of running the sh file, and found out the error messages.
I had also oserved that the messages the bioinformatician shared, there's a txt file of the messages from the terminal, where it says "Lengths min 43, lo_quartile 251, median 251, hi_quartile 251, max 251", and so did I. I would understand he would have problems with the mininum length (this would fit with the error message), but the same code worked for him.
Any help is welcomed.
Thank you in advance!