stjude / XenoCP

A cloud-based tool for mouse read cleansing in xenograft samples
Apache License 2.0
5 stars 3 forks source link

`MC` record data field not removed from unmapped records in final merged BAM #25

Closed zaeleus closed 3 years ago

zaeleus commented 3 years ago

In XenoCP 3.1.2, the final merged BAM output (*xenocp.bam when running the either workflow) does not pass picard ValidateSamFile. The output fails the error types MISMATCH_MATE_CIGAR_STRING and MATE_CIGAR_STRING_INVALID_PRESENCE.

ERROR::MATE_CIGAR_STRING_INVALID_PRESENCE:Record 116804, Read name <redacted>, Mate CIGAR String (MC Attribute) present for a read whose mate is unmapped
ERROR::MATE_CIGAR_STRING_INVALID_PRESENCE:Record 117418, Read name <redacted>, Mate CIGAR String (MC Attribute) present for a read whose mate is unmapped
ERROR::MISMATCH_MATE_CIGAR_STRING:Record 116804, Read name <redacted>, Mate CIGAR string does not match CIGAR string of mate
ERROR::MISMATCH_MATE_CIGAR_STRING:Record 117418, Read name <redacted>, Mate CIGAR string does not match CIGAR string of mate

XenoCP is not removing the mate CIGAR (MC) field from the auxiliary record data after unmapping a record.

mcrusch commented 3 years ago

Problem is in TweakSam, one of XenoCP's dependencies. This has been fixed in cluster_code trunk in r21704. So, fixing XenoCP should just be a matter of pushing dependencies to GH.

adthrasher commented 3 years ago

I ran XenoCP on SJACT030812_X1 with the updated TweakSam. Picard ValidateSam no longer produces the errors for the mate CIGAR value.

Error Type      Count
ERROR:INVALID_MAPPING_QUALITY   931
WARNING:MISSING_TAG_NM  55501102
adthrasher commented 3 years ago

@zaeleus - Please see the 3.1.3 release and confirm that this resolves your issue.

zaeleus commented 3 years ago

It seems to be correct on modified sample data. I see the MC field is moved into the serialized XU field when the record is unmapped.