samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
276 stars 244 forks source link

Remove remaining incorrect zero-length-B-array checks and exception throwing #1669

Closed jmarshall closed 11 months ago

jmarshall commented 1 year ago

PR #1194 removed a number of Array.getLength(value) == 0 checks when setting B-array tagged fields, as such empty tagged field values had recently been made explicitly valid in SAM/BAM/CRAM files.

Unfortunately the existing such check in setUnsignedArrayAttribute() was omitted from that PR and has now caused the issue raised in samtools/hts-specs#728. Whatever additional problems might (but probably don't) exist elsewhere in the HTSJDK code for zero-length ML tags, this check for a generic zero-length array was incorrect and should be removed. A test case for setUnsignedArrayAttribute() has also been added.

In grepping for other candidates, I noticed one other potentially incorrect looking Array.getLength invocation.

This was in Slice::setAttribute(), which appears to set an optional tag in a Slice header. There are currently no such tags defined, so the question is somewhat moot but — as a separate commit for your consideration — I have also removed that check. (As there are no such tags currently defined, there are no test cases for these fields to which a test exercising this change could be added.)

jmarshall commented 1 year ago

I am not a Java programmer and I am not familiar with your test suite or how to set up a CRAM unit test in it.

I provided this pull request to get the ball rolling, as there had been no response to my comment on #1194 notifying the HTSJDK maintainers of this problem. If the maintainers think that applying this change requires additional unit tests, they should add such tests to the PR themselves or use this PR as inspiration for their own fix for this problem.

droazen commented 1 year ago

No problem @jmarshall, @gileshall has kindly volunteered to complete this PR. Thanks for getting the ball rolling

cmnbroad commented 12 months ago

@gileshall Sorry - I'm a bit late in looking at this. Since this method is unused by the CRAM code, I'd propose that we just remove it rather than adding a test to cover it.

cmnbroad commented 12 months ago

Also, FWIW, the Slice class has pretty extensive unit test coverage, distributed across a few test files, with the primary ones being in the SliceTests class.

lbergelson commented 11 months ago

This was finished in #1674. Thank you everyone who contributed.

jmarshall commented 11 months ago

Thanks, I'm glad this made it into the release. 🎉

I wouldn't mind if you wanted to put “by @gileshall and @jmarshall” in the release notes. (And this PR's title “Remove remaining incorrect zero-length-B-array checks” is perhaps a better summary one-liner.)

lbergelson commented 11 months ago

@jmarshall Ah, sorry about that. I made sure it credited you in the actual commit but I just hit the button for "generate some lazy release notes" and it apparently doesn't include co-authors. I've updated it. My recent commit message work has been pretty bad. Zero-length array is better than the accidental string of 6 commits that all say "fix" that I merged before that because I hadn't noticed it was set to rebase and not squash...

jmarshall commented 11 months ago

NBD but thanks — it's good to have things to point to if people ask me what I spend my time on…

Bad luck about your string of “fixes” :smile: