walaj / bxtools

Tools for analyzing 10X Genomics data
MIT License
42 stars 10 forks source link

Runtime error of ChrID in bxtools convert #14

Closed bma-genetics closed 7 years ago

bma-genetics commented 7 years ago

Hello,

I have encountered an error when I use bxtools convert to convert a lariat generated bam file.

[bma@node63 Test]$ /usr/bin/time bxtools convert $bam > test.bam terminate called after throwing an instance of 'std::invalid_argument' what(): BamHeader::IDtoName - ID must be >= 0

I found these code in bxconvert.cpp

      std::string chr = hdr.IDtoName(r.ChrID());  <----
      r.SetChrID(bxtags[bx]);
      r.AddZTag("CR", chr); 

and

these code in BamHeader.cpp

  if (id < 0)
    throw std::invalid_argument("BamHeader::IDtoName - ID must be >= 0");

have generated this error message.

I tried to print out the r.ChrID() and readname

       if (r.ChrID() < 0) {
         std::cerr << r.ChrID() << std::endl;
         std::cerr << r.Qname().c_str() << std::endl;
         continue;
       }

and get the result : -1 ST-E00126:314:HFL3FALXX:6:2202:30776:15953

Then I grep the readname and get an unmapped pair-end read:

ST-E00126:314:HFL3FALXX:6:2202:30776:15953      173     *       0       0       47S20M83S       *       0       10495934        ACATATATATATGTAACATAAGGTTCCATTAAACCTGTCGTTCGTCCAACCATTTTATAAAATATATATGTTTTCCTTTATTTTTTGTTTTCATTAATCCTATATCTGAATTTTCTTCCTCTTTCTTTTTCGATGTAAACTGAGTTTTCT   AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJFJJJJJJJJJJJJJJJJFJJJFJJJJJFJJFFJJFFFJJFFFJ7JJJJJFFFFAJ<<-<<FF-<-A777FJJJFF   XM:A:0  QX:Z:AAFFFJJJJJJJJJJJ   AM:A:0  RX:Z:AACCATGGTCGACTAT    AS:f:-141.5     RG:Z:FtTest01:LibraryNotSpecified:1:HFL3FALXX:6 XS:f:-141.5     BX:Z:AACCATGGTCGACTAT-1 XT:i:1  OM:i:0
ST-E00126:314:HFL3FALXX:6:2202:30776:15953      93      *       0       0       43S22M62S       Ft8     5680442 10495934        CCTAAAAAAATAATACCCCACGTCCTATTAACTCATCAAATTAAAATGATATTTTATTTCATAAATTGAAAGTTCTTACAAAATGATAATAATAATTGTTTATATATAACTTGGCAAGTTAACTCCT  --JJJJJJJJJJJJJJFJJJFJFJJFJJJJFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJ  XM:A:0  QX:Z:AAFFFJJJJJJJJJJJ   AM:A:0  TR:Z:CGTAACT    TQ:Z:JJJJJJJ    AS:f:-141.5     RG:Z:FtTest01:LibraryNotSpecified:1:HFL3FALXX:6  XS:f:-141.5     BX:Z:AACCATGGTCGACTAT-1 XT:i:1  RX:Z:AACCATGGTCGACTAT   OM:i:0

Is this hdr.IDtoName line necessary? I found it only used for generating CR tag in r.AddZTag line. I just commented these two lines to walk around this error message.

      // std::string chr = hdr.IDtoName(r.ChrID());
      r.SetChrID(bxtags[bx]);
      // r.AddZTag("CR", chr); 
walaj commented 7 years ago

This is a good catch. The problem is that there is no chromosome name for an unmapped-unmapped read, creating this error. I'll keep this issue report open to remember to fix this issue (should be a quick few lines).

I'll also add a flag so that all of extra tags can be stripped from the BAM during convert. Presumably, the original BAM is still there, and the extra flags (like CR) may not be useful at all anyway.