Closed kdeangelis closed 11 years ago
I appear to have duplicated this error, also with MiSeq data. I got the following output:
INFO VER pandaseq 2.0 andre@masella.name ERR BADID @MISEQ:9:000000000-A1BL8:1:1101:17377:1320 1:N:0: STAT TIME Tue Sep 18 11:30:14 2012
STAT ELAPSED 0 STAT READS 0 STAT NOALGN 0 STAT LOWQ 0 STAT OK 0 INFO API 1
Program works fine with the sample data. It seems there's something about MiSeq fastq files it somehow doesn't like?
I solved the problem by using flash instead: http://bioinformatics.oxfordjournals.org/content/early/2011/09/07/bioinformatics.btr507.full http://genomics.jhu.edu/software/FLASH/index.shtml
I know this does not help with pandaseq, but it might help you with your assembly.
The sequence IDs shown are do not contain Illumina index reads. Have they been through the barcoding process?
I've updated PANDAseq to include a -B switch that ignores the barcode. Please try it out and let me know if it works.
@kdeangelis thanks for the suggestion!
@ apmasell I'm not able to test it out right now but hopefully I can let you know in a few days if it works or not.
@apmasell I have also encountered the same issue with MiSeq data and the -B flag did not fix the issue (again there was a separate index read that is not incorporated into forward and reverse reads I am trying to align with PANDAseq):
0x11e3850 ERR BADID HWI-M00181:9:000000000-A1H1G:1:1101:13186:1697: @HWI-M00181:9:000000000-A1H1G:1:1101:13186:1697 1:N:0:
@HWI-M00181:9:000000000-A1H1G:1:1101:13186:1697 1:N:0: TTCGGACTACCAGGGTATCTAATCCACAGCGTATCTCGTATGCCGTCTTCTGCTTGCACGTCAGAACTCCAGTCAATAATCCACAGATCTCGTTTTTCGTCTTCTTATTTCTACTTTTTTTTTCTTTTTTTTTTTTTTTTTTCTCTTCTTCTTTACTCTCTCTTCTCATTGTTCTATATGTATTCTCTTTCATGTGTGTCGTCGATGTACAGAGTTATTGCTAATGTATTAGGTCTCTTCATGATGTTTT + 5==9,++5-5--@@@@ECEE>.AEAB8-6++@CEEFGDEEEDGFDEDEEDFFFFFF;F################################################################################################################################################################################################
I've made another change that should make that work properly. Also, in the AXIOME repository, there is a program aq-marry-illumina-index
to combine the 3-part forward, index, and reverse reads into forward and reverse reads with barcodes in the sequence names. Why some sequencing centres fail to do this is beyond my comprehension.
Thanks, now works with the -B flag.
Greetings,
I'm having the same issue with BADID, even with running -B. Output below. I need the index reads separate from the forward and reverse reads so I can run my data through QIIME. Any suggestions on merging PANDAseq and QIIME? I found the following QIIME forum discussing how to do this, but again, the index reads are an issue.
https://groups.google.com/forum/#!msg/qiime-forum/CO9EmR4FH58/vNuaaOyAv-cJ
pandaseq -f cat_R1.fastq -r cat_R2.fastq -B -F -o 75 > pandassembled.fastq INFO VER pandaseq 2.4 andre@masella.name INFO ARG[0] pandaseq INFO ARG[1] -f INFO ARG[2] cat_R1.fastq INFO ARG[3] -r INFO ARG[4] cat_R2.fastq INFO ARG[5] -B INFO ARG[6] -F INFO ARG[7] -o INFO ARG[8] 75 0x694890 ERR BADID HWI-ST1360:0::38:0:0:0: @HWI-ST1360:38:C17HNACXX:5:1101:5055:3999#0/1
STAT ELAPSED 0 STAT READS 0 STAT NOALGN 0 STAT LOWQ 0 STAT BADR 0 STAT OK 0
I have never seen an sequence identifier like this. Do you know what version of the Illumina CASAVA pipeline produced it?
As for formatting data, we have AXIOME that runs PANDAseq, imports other FASTA files and runs various QIIME analyses on them, automatically.
It's 1.8.2, but my sequencing center modified the run so it matches the requirements for QIIME's split_libraries_fastq.py (link below).
http://qiime.org/tutorials/processing_illumina_data.html
I want to use PANDAseq because I sequenced V4 of 16s - with our HiSeq 2500, the forward and reverse reads completely overlap so I want to error correct the paired-end reads.
So I can use AXIOME to add the index reads to the headers to run PANDAseq, then reformat the output to remove the headers and use with QIIME? Is it actually the missing index read that is causing the error? I used -B.
Can you get the unaltered sequence? The missing index is not causing the error, it is the unusual ordering of the data. Normally, headers for CASAVA 1.8 look like HWI-ST822:85:C05C3ACXX:1:1101:1171:2104 3:N:0:TAGACA
. Note the lack of #
and /
.
AXIOME will modify the sequence headers for QIIME compatibility.
I used sed to fix the headers and it appears to be working. Thank you!
MiSeq data, get the following error. Primers and barcodes have been previously trimmed. -B does not help. Kindly help. Thanks
P.S. In header of sequence, I inserted spaces before and after the 100 (run number) to avoid auto-formatting on github.
$ pandaseq −f fwd.fastq −r rev.fastq -B Ignoring extra arguments passed. You must supply both forward and reverse reads. Too confused to continue. Try -h for help. $ pandaseq −f fwd.fastq −r rev.fastq -B -w out.fasta Ignoring extra arguments passed. You must supply both forward and reverse reads. Too confused to continue. Try -h for help. $ head -10 fwd.fastq rev.fastq ==> fwd.fastq <== @M00720: 100 :000000000-A7YE1:1:1101:14230:2979 1:N:0:49 GGCACAAACGAGAGCTCGATGGCACTCTTCAAAAATCCATATCCACCTTGTGTGCAATGTTTGTTGGGAAAGTCTTTTCTTTCCCTTCATAAATATCAACCTATATCTTTAACAACATTCGTCTGATAACATATTATGAATATACTTAATTCAAAATATAACTTTCAACAACGGATCTCTTGGCTCTC +
1>>1B111>111A1AF0E0B0AA1A0BD1D1BFG01BBEGFDDFGGFGH1FBFHFHFGG2GFBFFECGFAGDFGGFHHHHHHHHHHHFBHHHGHHHHFHHHHHHHEHHHHFHHGHEBFHHGGHEHGHGGHHHFGHHHHHHHHHHHHHHHHHFGHHEHHHHHHHHGGFHEGGGGGGGHHHHGGHHHEH @M00720: 100 :000000000-A7YE1:1:1101:14946:3013 1:N:0:49 GGCAAAATAAGAGTCTCATGGCACGTCTTAAACCCATATCCACCTTGTGTGCAATGTCAGTCGATCTTCTTCATGGAGATCGACCAAACATCAACCTTTATTTTTTAACTCTTTGTCTGAAAAATATTATGAATAAACAATTCAAAATACGACTTTCAACAACGGATCTCTTGGCTCTC + 11>>1B1BB1B11BB1A3311BBBABBBF3A11BEABBGHH2BBBBEFGFFFHHHFGHFFFGEFFHGHHHHHGHGFHHHGHHFGGGHGHFHHFHHHHHHHHHGHHGHHHHHHHHHHHHGGFHFGHFGFEFHHGFGGGHHEGGHFFHFHFHHGGGFGHGFHEHHGGGC@FDFGGFFGGCG @M00720: 100 :000000000-A7YE1:1:1101:19136:3176 1:N:0:49 TATTTAACTGGCGGCGATTGCGTACCCGTCGACCAAAATTAGGGTCAACGCTACCTGTAGGAAGTGTCCGCATAAAGTGCACCGCATGGAAATGAAGACGGCCATTAGCTGTACCATACTCAGGCACACAAAAATACTGATAGCAGTCGGCGTGTGAATCATTAGCCTTGCGACCCTCGGCAGCAAGAACCATACGACCAATATCACGAAAATAGTCACGCAAAGCATTGGGATTATCATAAAACGCCTC
==> rev.fastq <== @M00720: 100 :000000000-A7YE1:1:1101:14230:2979 2:N:0:49 GAGAGCCAAGAGATCCGTTGTTGAAAGTTATATTTTGAATTAAGTATATTCATAATATGTTATCAGACGAATGTTGTTAAAGATATAGGTTGATATTTATGAAGGGAAAGAAAAGACTTTCCCAACAAACATTGCACACAAGGTGGATATGGATTTTTAAAGAGTGCCATAAAACACTCGTTTGTGAATGATCCTTCCGCTGGTTCACAATTACCATAGTGTAGATCTCGGTGGTCGCCGTATCATTAAA + A@@AAAF44ACFB5AFAAB2AAADDBD555ABDFHHFAGCDGHGDFHHHHHHGGHHHHHHGHHHHHGHGGGHGAFHHHHHHHHGFGHHGFGGHHHHHGGHHHHHHGHGHHHFFHHGFHHGHHHFHGGHFHFGGGFDGHHGGHHDCFHGHHFFHDFBFHCGFHGHFGGHFFHHGHHHHHGF?FF<?FGGHHHFGGBDHBCEGCDCDG0DDGFGHGGDDGH=FDCCFC00CGA:A?;;CDD@:D.:B;0;BF @M00720: 100 :000000000-A7YE1:1:1101:14946:3013 2:N:0:49 GAGAGCCAAGAGATCCGTTGTTGAAAGTCGTATTTTGAATTGTTTATTCATAATATTTTTCAGACAAAGAGTTAAAAAATAAAGGTTGATGTTTGGTCGATCTCCATGAAGAAGATCGACTGACATTGCACACAAGGTGGATATGGATTTAAAAAGTGCCATAAAACACTTATTATGAATGATCCCTCCGCTGGTTCACAATTACCATAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAA + A???AAC44A?CB5BBB2B2AAABBDD5DAABBFGHFAHHFGDFGEDGHHHHGHGHGHHHHGHEFGFHHHFFFFFGHHGHHHHGHGGGHGFDEGEE?F>@FFGHHDFGFHHGHHG3EGEF?EFFDFBBB?FGGFGHG?FFEGDFDFHHHH3DBGFG@@@FFGCGHEGE0GGDF@GFDFF?FH1FFF1GGCDDCFCD0<DFB0D0=D0DGG=<;;GH0;CCCE.:A;0:9A9;A.CBC0;;BBFADCAAD @M00720: 100 :000000000-A7YE1:1:1101:19136:3176 2:N:0:49 CACGCGCACACGCTCCGCTATTCAGCGTTTGATGATTGCAATGCGACAGGCTCATGCTGATGGATGGTTTATCGTTTTTGACACTCTCACGTTGGCTGACGACCGATTAGAGGCGTTTTATGATAATCCCAATGCTTTGCGTGACTATTTTCGTGATATTGGTCGTATGGTTCTTGCTGCCGAGGGTCGCAAGGCTAATGCTTCACACGCCGACTGCTATCAGTATTTTTGTGTGCCTGAGTATGGTA
Whatever program used to trim the barcodes and primers has modified the headers. The flow cell identifier has been removed.
I do see the flowcell 000000000-A7YE1. Anyways, shouldn't the -B ignore this?
Given eg.: HWI-ST822 :85: C05C3ACXX: 1: 1101: 1171: 2104 3: N: 0: TAGACA Explanation instrument :run: flowcell: lane: tile: x: y direction: filtered: flags: tag My data @M00720 : 100 : 000000000-A7YE1: 1: 1101: 14230: 2979 1: N: 0: 49
No, -B
only ignores missing barcodes. These have barcodes, though they are numeric. Also, the error is totally unrelated to barcodes. The error is probably due to pasting a em-dash (–) from the man page instead of typing a hyphen (-).
Ah! You Sir are absolutely right! I had copied them and it got copied a em-dash instead of hyphen. Explicitly retyping, fixed it. Thanks for the very quick response. Saved me a lot of anxiety and time. A quick question: what are the defaults for the algorithm and threshold?
Thanks.
The default algorithm is the one in the paper (simple Bayesian) and the threshold is 0.6. This is in the manual page.
Pandaseq seems to not like the IDs in my fastq files. It will not assemble my PEs, and my run output looks like this: ubuntu@ip-10-29-191-5:~$ pandaseq -f R1-20.pandID.fastq -r R3-20.pandID.fastq -F > panda_test.fastq INFO VER pandaseq 2.0 andre@masella.name ERR BADID @VA_1101_19100_2205 STAT TIME Wed Sep 5 15:47:30 2012
STAT ELAPSED 0 STAT READS 0 STAT NOALGN 0 STAT LOWQ 0 STAT OK 0 INFO API 1
These are MiSeq runs, but I can't figure out what it doesn't like. The originals looked like this:
@VARITEK:9:000000000-A1VLD:1:1101:19100:2205 1:N:0: TNCGAAGGGGGCTAGCGTTGCTCGGAATCACTGGGCGTAAAGCGCACGTAGGCGGCTTTTTAAGTCAGAGGTGAAATCCTGGAGCTCAACTCCAGAACTGCCTTTGATACTGAGAAGCTCGAGTACGGGAGAGGTGAGTGGAACTGCGAG + ?#55<??@DDDDDBDDFEEDEFHHHEFFHHHHHHHHHHHHHHHHHHHHHHFEHHHHHEFHHFFFFFFHDFFFFFFBFFDBDDEEFFFEEFAABCB4=CEE=A,=BEEFEEE=,,5:AE*_88??1).4A?)08::A:_?#########
but I also tried these IDs @VARITEK_1101_19100_2205 and @VARITEK-1101-19100-2205
Am I interpreting this error correctly?