samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

Add option to hard clip read in CigarUtils (replacement for #1453) #1461

Closed kachulis closed 4 years ago

kachulis commented 4 years ago

@lbergelson here is the replacement PR for #1453 we discussed earlier today.

@fleharty, @lbergelson and I talked about how to get hard clipping merged ASAP, given that I do not have permissions to push directly to htsjdk, and so cannot make modifications directly to your branch. He suggested opening up a new PR to replace the old one, which is what this PR is.

codecov-io commented 4 years ago

Codecov Report

Merging #1461 into master will increase coverage by 0.787%. The diff coverage is 73.585%.

@@              Coverage Diff               @@
##             master     #1461       +/-   ##
==============================================
+ Coverage     68.41%   69.197%   +0.787%     
- Complexity     8499      8700      +201     
==============================================
  Files           583       587        +4     
  Lines         34413     34571      +158     
  Branches       5729      5776       +47     
==============================================
+ Hits          23542     23922      +380     
+ Misses         8650      8370      -280     
- Partials       2221      2279       +58
Impacted Files Coverage Δ Complexity Δ
src/main/java/htsjdk/samtools/util/CigarUtil.java 53.571% <73.585%> (+23.268%) 4 <2> (-10) :arrow_down:
...n/java/htsjdk/samtools/RandomAccessFileBuffer.java 0% <0%> (-71.667%) 0% <0%> (-10%)
src/main/java/htsjdk/samtools/CRAMBAIIndexer.java 71.711% <0%> (-9.727%) 12% <0%> (-5%)
...m/encoding/core/SubexponentialIntegerEncoding.java 85.714% <0%> (-8.73%) 5% <0%> (+1%)
...c/main/java/htsjdk/samtools/cram/build/CramIO.java 67.347% <0%> (-7.996%) 8% <0%> (-12%)
src/main/java/htsjdk/samtools/CRAMCRAIIndexer.java 74.51% <0%> (-7.308%) 10% <0%> (ø)
...tools/cram/encoding/core/GammaIntegerEncoding.java 93.333% <0%> (-6.667%) 5% <0%> (+1%)
...ava/htsjdk/samtools/cram/structure/EncodingID.java 93.75% <0%> (-6.25%) 6% <0%> (+3%)
src/main/java/htsjdk/samtools/Bin.java 77.551% <0%> (-6.122%) 17% <0%> (-1%)
.../samtools/cram/encoding/readfeatures/SoftClip.java 50% <0%> (-4.545%) 6% <0%> (-1%)
... and 162 more
kachulis commented 4 years ago

Thanks for the quick review @lbergelson. I have added code to invalidate NM, MD, and UQ tags when clipping changes the length of the read. Note this will change behavior (to be more correct) for soft-clipping as well. Also added more validation of the things that are changes to clip3PrimeEndOfRead. Back to you!

kachulis commented 4 years ago

@lbergelson I added documentation and tests for the tag invalidations. Also adjusted the logic to only invalidate in the number of reference bases the read aligns to changes (no need to invalidate these tags if we are harclipping bases that were already softclipped).

I don't think I have merge permissions, so I'm happy with you merging this at this point if you approve.