Closed AroneyS closed 10 months ago
Only if you are skipping QC with fastp AND running coassembly does the concatenation happen. I also can't find anything in spades docs that says this would cause issues? It should be fine if it can find the next read pair somewhere in the file, but I am likely mistaken
Does spades not just assume interleaved reads and go with it?
What about megahit?
Better to just change it to produce interleaved reads if it is a concern, the safest option would be to not allow the skipping of QC IMO
Yep, definitely a problem. Spades just assumes interleaved (since that is what the argument says).
e.g. (the reverse reads are named @SRR8943084.1 1/2
)
# assemble/data/short_read_assembly/split_input/short_reads_1.fastq
@SRR8799000.1 1/1
TTCGCGAATATGTCTAAACGCATGGGAGAGATGGTTAGGGAAGAATTAGAATTACTGGGTCCTAAGCCATTGGCTGAAGTAGAGACAGCACAGAAAGAAATAGTTGATAGTCTTGTCAAACTGGAGGCTCAAGGAGAAACAATAAGGGGA
+
DDDDDIIIIIIIIIIHIIIIIIHGIHIIIIHIHIHHIIIIIIIHHHIHHIIIIIIIIIIIIIIIIIIIHIIIIIIHIIIIIIIIIIIIIHIIGHHHIIIIIHHHHHFIHIIIIIIIIIIIIIIGHIIIHIIIIHIIIIIIIIIIIIIHHH
@SRR8799000.3 3/1
CTAAACATGGGTGGTATAATGGAATCAAACACATTTACAAAGATATACCTCGCCATTTTTGGGCAATTTGATTGGCAGGGGATACCGGCACTACCAATAGAAGTAATTCTTCTGCCTAATTCGTTTTACTTTAACATTTATGAGTTTTCT
+
DDDDDIIIIHHHHIHHIIIHHHIIIIIIIIEHHHHHHIIIIGIIIHIIEHH=DHIIHHCH/CCHIIIHGIHIIG@EHHFDHDEHIHHDHHHHIHHHIIIIIGHHHHIHHEEHHIGGHHHGIH?HEHHHIIHIIHIHHHFHICHHFH@GHE
@SRR8799000.5 5/1
GTACTCCTGCAGCAGCTGCGCGAGGTGGGCCTGCCGCTCCTGGGAGGTCTTCCAGCGGCCCAGGGCGGGAGTGAGCTGGCGGATCTTCTCGATGTCCATGCGAGCCTGGCCGAGCTGGGCGGCGAAGCCCTTCACCGACAGCTGCGCCTC
# assemble/data/short_read_assembly/split_input/short_reads_2.fastq
@SRR8799000.2 2/1
TCCACTTGATTTTTTCCATCGCATTATCTAAAAATTTTTGTTCAATGTCACGGAGTACAACATTGTATCCTGCCATGGCAGAAACCTGAGCAATACCATGTCCCATAATACCAGAACCTAAAACTA
+
DDDDDHIHIIIIIGHIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIGHIIIIHGHHIIHIIEHHIHIIIIIIIIIIHHIIIIIIHIIGHGIHEHHHHHHIHIIIIIIGHIGHHIHIHH?FFCH
@SRR8799000.4 4/1
AAAGATACAGGTATCTCACTGAAAGTATTGAGAGTAGATGGAGGGGCAACTGCAAACAATTTCTTATGCCAATTCCAATCCGACATTCTTAACCTGGCTGTAAGCCGTCCAAAAANT
+
DDBDDIHIIIIEHHIIIIIFIFHIHGHHHHIHEHFHHHCHHIIIIHIGIGFHHGHHHGIIIIGHIIIIIIIIIIIIIIIFHIIHIIFEEHIIFHHHF@EHHGHHIIIIIIIIHGH#<
@SRR8799000.6 6/1
TATTGACAAACCAGAATTTTGTTTTGGTGCCCATTTGTTCAATCTGCAACTTCTCGTTTACGTTAAGGTTATATACTGGGTAAGATTGACACATTATGGACAAGCTTTCTCGATTCGGCTAATATTCATCTTATATATTACGATAAATGG
I need to do QC outside of Aviary to also do unmapping. Why does skip qc change the files anyway? Can't you just pass them through?
Only if you are skipping QC with fastp AND running coassembly does the concatenation happen.
Skip QC is sufficient. If you skip QC, then they are sequentially concatenated in line 89/90, if coassembly then in lines 64/75. Neither does deinterleaving.
Dang, I was hoping you were right. That's 6 weeks down the drain.
Bruh, I'm sorry. Hang on, I've got a fix I'll push up to another branch
Short forward+reverse reads are concatenated in sequence instead of interleaved (expected by metaspades and megahit) https://github.com/rhysnewell/aviary/blob/5915015b114ef7284a3e50d559e204b4b6c5428d/aviary/modules/quality_control/scripts/qc_short_reads.py#L89
Please tell me I'm wrong...