gt1 / biobambam2

Tools for early stage alignment file processing
Other
93 stars 17 forks source link

disablevalidation option for bammerge possible ? #51

Closed af8 closed 6 years ago

af8 commented 6 years ago

Hi,

I'm using biobambam2 and I have a few FASTQs with some '@' in the read names.

I have used bamsort etc ... with disablevalidation=1 to avoid manipulating the raw fastq files by hand before lauching the pipeline.

But when I then need to merge the BAMs into a single one, bammerge will refuse to proceed because of these 'non-valid' read names. Is it possible to have disablevalidation option in this tool ?

Best wishes, Anthony

af8 commented 6 years ago

As a temporary workaround I have recompiled libmaus2 having modified the ASCII table in libmaus2/bambam/BamAlignmentDecoderBase.cpp

This works but let me know if adding this option is possible.

Thank you.

gt1 commented 6 years ago

Hi,

it would be possible to disable validation in bammerge, but I think this is really the wrong way to handle this. Read/query names with @ symbols are not valid ( see the specs at https://samtools.github.io/hts-specs/SAMv1.pdf ), so the fault really lies with whatever program creates such SAM/BAM/CRAM files in the first place. Assuming biobambam2 would let the file pass, any other (spec compliant) downstream tool should reject it too. So there would be little gain.

Best, German

af8 commented 6 years ago

Hi German,

Yes, fair enough.

But then, following this line of reasoning, why would you make this option available in other tools such as bamsort or bammarkduplicates2 rather than requiring valid read names right from the start ?

Best, Anthony

gt1 commented 6 years ago

Hi Anthony,

the setting was introduced for benchmarking how much additional time the validation requires in these tools. Processing invalid files was never the objective.

Best, German

af8 commented 6 years ago

OK understood. Thank you German.