gt1 / biobambam2

Tools for early stage alignment file processing
Other
93 stars 17 forks source link

bam2fastq failed #56

Open lmanchon opened 6 years ago

lmanchon commented 6 years ago

--Hi,

strange error using bam2fastq:

BIOBAMBAM/bin/bamtofastq filename=D1_sorted.bam inputformat=bam gz=1 F=D1_R1.fastq.gz F2=D1_R2.fastq.gz O=orphan_D1_1.fastq.gz O2=orphan_D1_2.fastq.gz

BAM header is not consistent (binary and text do not match) for @SQ SN:8 LN:11660

LIBMAUS2/lib/libmaus2.so.2(libmaus2::util::StackTrace::StackTrace()+0x54)[0x7f71ec496414] BIOBAMBAM/bin/bamtofastq(libmaus2::exception::LibMausException::LibMausException()+0x20)[0x437f00]/BIOBAMBAM/bin/bamtofastq(libmaus2::bambam::BamHeader::initSetup()+0xbbb)[0x49a55b] BIOBAMBAM/bin/bamtofastq(void libmaus2::bambam::BamHeader::init(libmaus2::lz::BgzfInflateStream&)+0x299)[0x49ec19] BIOBAMBAM/bin/bamtofastq(libmaus2::bambam::BamDecoderWrapper::BamDecoderWrapper(std::unique_ptr<libmaus2::aio::InputStream, std::default_delete >&, bool)+0x341)[0x49f901] BIOBAMBAM/bin/bamtofastq(libmaus2::bambam::BamAlignmentDecoderFactory::construct(std::istream&, std::string const&, std::string const&, unsigned long, std::string const&, bool, std::ostream, std::string const&)+0xd2d)[0x4a07bd] BIOBAMBAM/bin/bamtofastq(libmaus2::bambam::BamMultiAlignmentDecoderFactory::construct(libmaus2::util::ArgInfo const&, bool, std::ostream, std::istream&, bool, bool)+0x381)[0x4a1fa1] BIOBAMBAM/bin/bamtofastq(bamtofastqCollating(libmaus2::util::ArgInfo const&)+0x5de)[0x43344e] BIOBAMBAM/bin/bamtofastq(bamtofastq(libmaus2::util::ArgInfo const&)+0x3c1)[0x4344c1] BIOBAMBAM/bin/bamtofastq(main+0x1a02)[0x42cbc2] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f71ea906b45] BIOBAMBAM/bin/bamtofastq()[0x42e79f]

do i need to reformat my bam input file, but how ?

thank you --

keiranmraine commented 6 years ago

Hi @lmanchon. Do you really have lowercase @sq? They should be @SQ.

gt1 commented 6 years ago

Hi,

the lower case sq is most likely an artefact introduced by github. The error line reported here is only triggered if the BAM header has different length values in the text header (as seen in SAM) and the (redundant) binary version stored in the BAM file. How was the file created? It's broken on a very basic level. You may be able to convert it to SAM using some program which does not check for such anomalies. I'd try samtools view.

Best, German

lmanchon commented 6 years ago

--Hi,

same error on sam file: Program: samtools (Tools for alignments in the SAM format) Version: 1.3.1 (using htslib 1.3.1) samtools view -h -o D6_DMSO_sorted.sam D6_DMSO_sorted.bam

BIOBAMBAM/bin/bamtofastq filename=D6_DMSO_sorted.sam inputformat=sam gz=1 F=D6_DMSO_R1.fastq.gz F2=D6_DMSO_R2.fastq.gz O=orphan_D6_DMSO_1.fastq.gz O2=orphan_D6_DMSO_2.fastq.gz BAM header is not consistent (binary and text do not match) for @SQ SN:8 LN:11660 LIBMAUS2/lib/libmaus2.so.2(libmaus2::util::StackTrace::StackTrace()+0x54)[0x7f3fa478a414] BIOBAMBAM/bin/bamtofastq(libmaus2::exception::LibMausException::LibMausException[0x437f00]

gt1 commented 6 years ago

Hi,

could you post the complete SAM header?

Thanks

lmanchon commented 6 years ago

sam_header.zip

gt1 commented 6 years ago

Hi,

you have sequence 8 appearing twice in your SAM header. biobambam2 should provide a more sensible error message for such cases, but it's definitely a broken input file.

> diff -c <(awk < sam_header.txt '/^@SQ/{print $2}' | sort) <(awk < sam_header.txt '/^@SQ/{print $2}' | sort -u)
*** /dev/fd/63  2017-12-18 11:37:07.377791198 +0100
--- /dev/fd/62  2017-12-18 11:37:07.377791198 +0100
***************
*** 3,9 ****
  SN:15
  SN:20
  SN:8
- SN:8
  SN:A01Shi21
  SN:A09Calca
  SN:A0DBoliv
--- 3,8 ----
lmanchon commented 6 years ago

yes, but why others tools such as bedtools bam2fastq, bam2FastQ (BamUtil) or bam2fastq(https://github.com/jts/bam2fastq) are able to process this broken file ?

mmokrejs commented 6 years ago

@lmanchon Please report this duplicated line to the author of the tool which produced the broken BAM file.