jts / sga

de novo sequence assembler using string graphs
http://genome.cshlp.org/content/22/3/549
237 stars 82 forks source link

sga-align: prepareReads: Cannot parse record #121

Open sjackman opened 8 years ago

sjackman commented 8 years ago
sga-align -t 64 --name pe400 hsapiens-contigs.fa pe400.fa.gz
…
Completed Task = 'indexContigs' 
Task enters queue = 'prepareReads' 
Cannot parse record >HISEQ1:93:H2YHMBCXX:1:1101:1165:2015 at /gsc/btl/linuxbrew/bin/sga-deinterleave.pl line 63, <IN> line 2.

The file pe400.fa.gz is interleaved paired-end reads. The first 8 lines are:

>HISEQ1:93:H2YHMBCXX:1:1101:1165:2015 ec:Z:0_0:1_0_1:0_0
TTATACAAAGAATTAAGAACAAAAGTGAAATTGAATATTTTTTAATTGCTCTAAAAGTTAATGGACTATTTAAAACAAAAATTATAAAAATATGTTTATACCATTAATAGAAGTAAAATATATAAAACCATGGAATAACACACAGACTAGGAGGACTTGGGAATATGCTGTTACATTGCATATTAAGTGGTATTATATTATTTGAAGTTAGATTTATTAACAATTACAGAGCTAATTTTTTTTTTAAAAA
>HISEQ1:93:H2YHMBCXX:1:1101:1165:2015 ec:Z:0_0:1_0_1:0_0
CTGACATCTTTCTGGCATCCTTAAAAGCCCTGGCTTTTAAGCATAACTTCTTGACCTACTTGTTCCCTTCCTGAGCATGAGAGCAGTGGTGACTCAGGAACAGGAAAGGCAGACCACAGTGGTGACAGTGTTTTCCTCAAAGAGGATTTATACCTGTTTTTTTAAAAAAAAAATTAGCTCTGTAATTGTTAATAAATCTAACTTCAAATAATATAATACCACTTAATATGCAATGTAACAGCATATTCCC
>HISEQ1:93:H2YHMBCXX:1:1101:1157:2041 ec:Z:0_0:3_0_3:0_0
GACCCGGTCCTGCGATTTGTCCCGTTGTAGACCTGGGAACAGGCAGGCGGGAACTGGGGGCTTTACTGGGGGATTTGAGGCTGGGGAGGGGGAGGGAGCAAATGTCATGGCTGGCTCGCTCAAGCATCCAGGGAACCGAAGCTAAGCGCATCCTGACGGGCTTTTAAAATGACATTGATTAGGACAAGCTGTTCCCAACCCCAGTAAGAGTTAATCTGCCTGTTAATCAAGGCACTAAGGGGCTCAATGC
>HISEQ1:93:H2YHMBCXX:1:1101:1157:2041 ec:Z:0_0:29_0_28:2_0
CCCCGGGCAGCGGTTTTCCCCGCTAGCCAGGTTTGGAAGTCACCCTCTGTGAGACTGGGTTAGGAAGTGACGAAAAGCGCCGAATTGTTTTCAAATTGAAAATACTTTTTTTTTTTTTTTTGGAGATAGCGCTGACAAATATATGGGATCCCGGCTTTTGATCCCTGGCTGCCGCCTCTGTTCTCCTGTCGCTAATAAAACTCGCATTGAGCCCCTTAGTGCCTTGATTAACAGGCAGATTAACTCTTAC
jts commented 8 years ago

What variant of FASTQ is that? I don't recognise the SAM-like key/value pair.

On Mon, Jul 25, 2016 at 5:57 PM, Shaun Jackman notifications@github.com wrote:

sga-align -t 64 --name pe400 hsapiens-contigs.fa pe400.fa.gz … Completed Task = 'indexContigs' Task enters queue = 'prepareReads' Cannot parse record >HISEQ1:93:H2YHMBCXX:1:1101:1165:2015 at /gsc/btl/linuxbrew/bin/sga-deinterleave.pl line 63, line 2.

The file pe400.fa.gz is interleaved paired-end reads. The first 8 lines are:

HISEQ1:93:H2YHMBCXX:1:1101:1165:2015 ec:Z:0_0:1_0_1:0_0 TTATACAAAGAATTAAGAACAAAAGTGAAATTGAATATTTTTTAATTGCTCTAAAAGTTAATGGACTATTTAAAACAAAAATTATAAAAATATGTTTATACCATTAATAGAAGTAAAATATATAAAACCATGGAATAACACACAGACTAGGAGGACTTGGGAATATGCTGTTACATTGCATATTAAGTGGTATTATATTATTTGAAGTTAGATTTATTAACAATTACAGAGCTAATTTTTTTTTTAAAAA HISEQ1:93:H2YHMBCXX:1:1101:1165:2015 ec:Z:0_0:1_0_1:0_0 CTGACATCTTTCTGGCATCCTTAAAAGCCCTGGCTTTTAAGCATAACTTCTTGACCTACTTGTTCCCTTCCTGAGCATGAGAGCAGTGGTGACTCAGGAACAGGAAAGGCAGACCACAGTGGTGACAGTGTTTTCCTCAAAGAGGATTTATACCTGTTTTTTTAAAAAAAAAATTAGCTCTGTAATTGTTAATAAATCTAACTTCAAATAATATAATACCACTTAATATGCAATGTAACAGCATATTCCC HISEQ1:93:H2YHMBCXX:1:1101:1157:2041 ec:Z:0_0:3_0_3:0_0 GACCCGGTCCTGCGATTTGTCCCGTTGTAGACCTGGGAACAGGCAGGCGGGAACTGGGGGCTTTACTGGGGGATTTGAGGCTGGGGAGGGGGAGGGAGCAAATGTCATGGCTGGCTCGCTCAAGCATCCAGGGAACCGAAGCTAAGCGCATCCTGACGGGCTTTTAAAATGACATTGATTAGGACAAGCTGTTCCCAACCCCAGTAAGAGTTAATCTGCCTGTTAATCAAGGCACTAAGGGGCTCAATGC HISEQ1:93:H2YHMBCXX:1:1101:1157:2041 ec:Z:0_0:29_0_28:2_0 CCCCGGGCAGCGGTTTTCCCCGCTAGCCAGGTTTGGAAGTCACCCTCTGTGAGACTGGGTTAGGAAGTGACGAAAAGCGCCGAATTGTTTTCAAATTGAAAATACTTTTTTTTTTTTTTTTGGAGATAGCGCTGACAAATATATGGGATCCCGGCTTTTGATCCCTGGCTGCCGCCTCTGTTCTCCTGTCGCTAATAAAACTCGCATTGAGCCCCTTAGTGCCTTGATTAACAGGCAGATTAACTCTTAC

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jts/sga/issues/121, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXxn2Tz8F5jw5EYs3NFVv5que0qPcB9ks5qZTFKgaJpZM4JUlv6 .

sjackman commented 8 years ago

It's produced by BFC.

jts commented 8 years ago

Is it safe to assume that the first record is always the first end of the pair? Alternatively you could use the uncorrected reads in scaffolding (which I typically recommend anyway)

On Mon, Jul 25, 2016 at 6:45 PM, Shaun Jackman notifications@github.com wrote:

It's produced by BFC https://github.com/lh3/bfc.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jts/sga/issues/121#issuecomment-235109353, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXxn1dTwKCRs2M9GuuBbdvDEtWXBsL_ks5qZTybgaJpZM4JUlv6 .

sjackman commented 8 years ago

Yes, the first record is always the first read of the pair / mate-pair. FR orientation for PE and RF orientation for MP. Good suggestion. If there's no easy workaround for using the corrected reads, I'll use the uncorrected reads.

sjackman commented 8 years ago

I instead aligned the reads using bwa mem

bwa mem -t32 -p contigs.fa reads.fa.gz | samtools view -F2304 -b -o reads.bam -