linsalrob / fastq-pair

Match up paired end fastq files quickly and efficiently.
https://edwards.flinders.edu.au/sorting-and-paring-fastq-files/
MIT License
142 stars 32 forks source link

pairing bug #14

Closed rohitjainnference closed 3 years ago

rohitjainnference commented 4 years ago

original left read

@SRR8996821.1 1/1
CTCCGTTTCCGACCTGGGCCGGTTCNCCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNCNNNNNNNNNNNNNNNNNNNNNNNNNNNCNTNGNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCAANCGATAG
+
AAFFFKAKA,<AKKKKKKF7AFFKK#7K##A###########################K###########################K#K#K#F###############################################A<,#AAFFKK
@SRR8996821.2 2/1
CTGGAGTGCAGTGGCTATACACAGGNGCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNTNCNCNCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGANGCCGAA
+
A<FFFFKKKK<AKAAAFA,,FKKKK#KF##K###########################7###########################F#,#A#F###############################################F7<#,,<FKF
@SRR8996821.3 3/1
AGATACCATGATCACGAAGGTGGTTNTCNNANNNNNNNNNNNNNNNNNNNNNNNNNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNTNTNGNANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGGNTGAACT

original right read

@SRR8996821.1 1/1
CTCCGTTTCCGACCTGGGCCGGTTCNCCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNCNNNNNNNNNNNNNNNNNNNNNNNNNNNCNTNGNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCAANCGATAG
+
AAFFFKAKA,<AKKKKKKF7AFFKK#7K##A###########################K###########################K#K#K#F###############################################A<,#AAFFKK
@SRR8996821.2 2/1
CTGGAGTGCAGTGGCTATACACAGGNGCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNTNCNCNCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGANGCCGAA
+
A<FFFFKKKK<AKAAAFA,,FKKKK#KF##K###########################7###########################F#,#A#F###############################################F7<#,,<FKF
@SRR8996821.3 3/1
AGATACCATGATCACGAAGGTGGTTNTCNNANNNNNNNNNNNNNNNNNNNNNNNNNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNTNTNGNANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGGNTGAACT

output left read

@SRR8996821.1 1/1
CTCCGTTTCCGACCTGGGCCGGTTCNCCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNCNNNNNNNNNNNNNNNNNNNNNNNNNNNCNTNGNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCAANCGATAG
+
AAFFFKAKA,<AKKKKKKF7AFFKK#7K##A###########################K###########################K#K#K#F###############################################A<,#AAFFKK
@SRR8996821.1 1/1
CTCCGTTTCCGACCTGGGCCGGTTCNCCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNCNNNNNNNNNNNNNNNNNNNNNNNNNNNCNTNGNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCAANCGATAG
+
AAFFFKAKA,<AKKKKKKF7AFFKK#7K##A###########################K###########################K#K#K#F###############################################A<,#AAFFKK
@SRR8996821.3 3/1
AGATACCATGATCACGAAGGTGGTTNTCNNANNNNNNNNNNNNNNNNNNNNNNNNNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNTNTNGNANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGGNTGAACT
AGATACCATGATCACGAAGGTGGTTNTCNNANNNNNNNNNNNNNNNNNNNNNNNNNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNTNTNGNANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGGNTGAACT
AGATACCATGATCACGAAGGTGGTTNTCNNANNNNNNNNNNNNNNNNNNNNNNNNNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNTNTNGNANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGGNTGAACT

output right read

@SRR8996821.1 1/2
ATCGCTTGAGTACAGGNGTTCTGGGNTGNAGTNNNNNNTNNCNANCNGGTNTNCGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCNGGANNGGGGNACCACCANGNTGCCTACNGNGGNNNGANCCNGCNAAGGTCGNGANNNGNGAG
+
AAFFFKKKKKK,7FKK#KKKKKKKK#KK#KKK######K##K#K#K#KAF#F#7K###############################K#KFK##,FFA#,FKFKK7#F#7F,AFA7#7#,F###FF#KF#A7#,7<A,<,#,,###<#A7A
@SRR8996821.2 2/2
CCGCACTAAGTTCGGCNTCAATATGNTGNCCTNNNNNNANNGNGNGNCCANCNGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNAAANNGAGCNGGTCAAAACNCCCGTGCNGNTCNNNAGNGGNATNGCGCCTGNGAANNGNCAC
+
,A<FFFKKKKKKFKKA#KFFKKKKK#AF#KFK######7##7#F#K#7,A#K#7F###############################<#FKK##KFFF#KKF,AFFAA#FKAFKKK#,#,,###F,#<K#KF#,,AA<,7#,AA##,#,AF
@SRR8996821.3 3/2
CCCCCACTACCACAAANTATGCAGTNGANTTTNNNNCNTNNGNGNANATCNCNGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCNCGCNNTGGGNAAAGCACCTNCGTGATCNTNCTNNNTANATNGGNAGAGCGTNGTGTNGNGAA
CCCCCACTACCACAAANTATGCAGTNGANTTTNNNNCNTNNGNGNANATCNCNGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCNCGCNNTGGGNAAAGCACCTNCGTGATCNTNCTNNNTANATNGGNAGAGCGTNGTGTNGNGAA
CCCCCACTACCACAAANTATGCAGTNGANTTTNNNNCNTNNGNGNANATCNCNGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCNCGCNNTGGGNAAAGCACCTNCGTGATCNTNCTNNNTANATNGGNAGAGCGTNGTGTNGNGAA
weinmaier commented 4 years ago

I am having the same issue with fastq files from another SRA accession which causes the bwa mem step in SPAdes to crash. It seems to only affect the first two reads. When I drop those first two reads, bwa mem runs without problems.

rohitjainnference commented 4 years ago

@weinmaier that is correct. I have a fix which worked for me. You can find it here: https://github.com/rohitjainnference/fastq-pair. However it has not been validated by @linsalrob . Thanks !

weinmaier commented 4 years ago

Thanks @rohitjainnference! I'll take a look at your version.

linsalrob commented 4 years ago

@rohitjainnference can you make a PR on the code and I will review it and add it? What was the issue?

joreynajr commented 3 years ago

I'm about to use this software, is this still a problem?

linsalrob commented 3 years ago

@rohitjainnference and @weinmaier can you please provide an example where you are having this error? I tried SRR8996821 and get these sequence identifiers that should work with the default code:

@SRR8996821.1.1 1 length=150

and

@SRR8996821.1.2 1 length=150
linsalrob commented 3 years ago

I have resolved this by adding a new option, -s that will disable splitting the IDs on the spaces. If you experience this issue, please consider adding the -s option to your fastq command.

@joreynajr please let me know if you have an issue and if so, what library ID is.

joreynajr commented 3 years ago

Hi, I was just concerned with using this software because I am working with SRA data but for now I don't have an issue. Thank you for the quick reply.

Joaquin