neufeld / pandaseq

PAired-eND Assembler for DNA sequences
GNU General Public License v3.0
129 stars 24 forks source link

BADID error for old sequence headers #57

Closed RvV1979 closed 7 years ago

RvV1979 commented 7 years ago

Dear all,

A colleague of mine had downloaded old MISEQ data sets from a published paper and asked me to analyze it. However, pandaseq throws a BADID error suggesting an issue with the fastq sequence headers.

When looking at the headers, I found that the header format is probably in the old (pre-Casava 1.8) format. Here is an example:

@MISEQ03:18:000000000-A1REG:1:1101:14774:1712#GATAGTGCCAC/1

I think the formatting is correct but VERY old.

As a workaround I can convert the headers to the current format but I figured you may want to support the old sequence header format in pandaseq.

Thanks

apmasell commented 7 years ago

This is different from the other CASAVA 1.4-1.6 headers that I have seen (which are supported). There's an extra block with the flow cell id, which is unusual. I will modify the parser.

apmasell commented 7 years ago

I've modified the parser. Let me know if it works.

mdehollander commented 7 years ago

I can confirm for our data that this works. Can you make a release so the bioconda package can be updated?

apmasell commented 7 years ago

Great. I'm releasing packages now.

mdehollander commented 7 years ago

I have create a conda package for pandaseq that works for osx and linux: https://github.com/bioconda/bioconda-recipes/tree/master/recipes/pandaseq https://anaconda.org/bioconda/pandaseq