samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

ignore TC, TN on CRAM read #1578

Closed yash-puligundla closed 2 years ago

yash-puligundla commented 2 years ago

Description

As per CRAMv3,

TC and TN are legacy data series from CRAM 1.0. They have no function in CRAM 3.0 and should not be present. However, some implementations do output them and decoders must silently skip these fields. It is illegal for TC and TN to contain any data values, although there may be empty blocks associated with them. TC, TN data series should not be written by cram writer. However, if they are present, they should be ignored

More details here

Fix

codecov-commenter commented 2 years ago

Codecov Report

Merging #1578 (d56e3d8) into master (a4f1f04) will decrease coverage by 0.053%. The diff coverage is 100.000%.

:exclamation: Current head d56e3d8 differs from pull request most recent head 09619d1. Consider uploading reports for the commit 09619d1 to get more accurate results

@@               Coverage Diff               @@
##              master     #1578       +/-   ##
===============================================
- Coverage     69.839%   69.786%   -0.053%     
  Complexity      9638      9638               
===============================================
  Files            702       702               
  Lines          37618     37516      -102     
  Branches        6111      6074       -37     
===============================================
- Hits           26272     26181       -91     
+ Misses          8900      8881       -19     
- Partials        2446      2454        +8     
Impacted Files Coverage Δ
...ava/htsjdk/samtools/cram/structure/DataSeries.java 100.000% <ø> (ø)
...s/cram/structure/CompressionHeaderEncodingMap.java 85.621% <100.000%> (+0.486%) :arrow_up:
...htsjdk/samtools/util/nio/DeleteOnExitPathHook.java 80.952% <0.000%> (-9.524%) :arrow_down:
...sjdk/samtools/util/htsget/HtsgetErrorResponse.java 73.333% <0.000%> (-6.667%) :arrow_down:
...tools/cram/encoding/core/GammaIntegerEncoding.java 86.667% <0.000%> (-6.667%) :arrow_down:
.../samtools/cram/compression/ExternalCompressor.java 69.565% <0.000%> (-4.509%) :arrow_down:
...mtools/cram/encoding/core/BetaIntegerEncoding.java 91.304% <0.000%> (-4.348%) :arrow_down:
...am/encoding/core/CanonicalHuffmanByteEncoding.java 52.941% <0.000%> (-2.941%) :arrow_down:
...ing/core/huffmanUtils/HuffmanParamsCalculator.java 80.000% <0.000%> (-1.818%) :arrow_down:
...va/htsjdk/samtools/util/htsget/HtsgetResponse.java 89.394% <0.000%> (-1.783%) :arrow_down:
... and 41 more
yash-puligundla commented 2 years ago
cmnbroad commented 2 years ago

@yash-puligundla Also, you might try wrapping both of the two data series sets (DATASERIES_NOT_WRITTEN_BY_HTSJDK and DATASERIES_NOT_READ_BY_HTSJDK) with Collections.unmodifiableSet - I think that will fix the spotBugs issue. Not sure why it only complains about one of them.

yash-puligundla commented 2 years ago

@yash-puligundla Also, you might try wrapping both of the two data series sets (DATASERIES_NOT_WRITTEN_BY_HTSJDK and DATASERIES_NOT_READ_BY_HTSJDK) with Collections.unmodifiableSet - I think that will fix the spotBugs issue. Not sure why it only complains about one of them.

I will do that. Thanks!

yash-puligundla commented 2 years ago

NOTE: We can't truly roundtrip and test for the absence of TC, TN as we have added code to ignore TC, TN on CRAM read. However, following @cmnbroad 's suggestion, I ran the tests -

cmnbroad commented 2 years ago

@lbergelson Can you merge this one ? thanks.

lbergelson commented 2 years ago

@yash-puligundla @cmnbroad Thank you!