Closed mjhipp closed 8 months ago
Merging #1531 (569ccf1) into master (2e3da45) will increase coverage by
0.000%
. The diff coverage is57.895%
.
@@ Coverage Diff @@
## master #1531 +/- ##
============================================
Coverage 69.407% 69.407%
- Complexity 8924 8945 +21
============================================
Files 602 604 +2
Lines 35521 35649 +128
Branches 5904 5924 +20
============================================
+ Hits 24654 24743 +89
- Misses 8533 8555 +22
- Partials 2334 2351 +17
Impacted Files | Coverage Δ | Complexity Δ | |
---|---|---|---|
src/main/java/htsjdk/variant/vcf/VCFHeader.java | 86.842% <57.895%> (-3.274%) |
82.000 <6.000> (+6.000) |
:arrow_down: |
.../java/htsjdk/samtools/util/BufferedLineReader.java | 48.387% <0.000%> (-20.034%) |
7.000% <0.000%> (+1.000%) |
:arrow_down: |
src/main/java/htsjdk/samtools/CRAMFileReader.java | 74.762% <0.000%> (-0.238%) |
47.000% <0.000%> (+2.000%) |
:arrow_down: |
...jdk/samtools/util/BlockCompressedOutputStream.java | 79.710% <0.000%> (ø) |
34.000% <0.000%> (+2.000%) |
|
src/main/java/htsjdk/io/Writer.java | 0.000% <0.000%> (ø) |
0.000% <0.000%> (?%) |
|
src/main/java/htsjdk/io/AsyncWriterPool.java | 72.222% <0.000%> (ø) |
5.000% <0.000%> (?%) |
|
src/main/java/htsjdk/io/HtsPath.java | 62.195% <0.000%> (+10.879%) |
20.000% <0.000%> (+5.000%) |
Hey @lbergelson is this a PR your team would be willing to review? Many thanks if so!
@clintval Yeah, we're definitely going to get to it. Reviews have been really slow lately because working from home in the pandemic is killing my productivity.
@mjhipp Thanks for taking a stab at this. While it does fix the duplicate line problem, it changes current behavior in a way that we should at least think about. Currently, any ID header line other than INFO/FORMAT (i.e., ALT, META, SAMPLE, PEDIGREE, or any other custom line) can be retrieved via getOtherHeaderLine(String key)
. I think with this change that will no longer be the case.
The real issue is that otherMeta
probably shouldn't have any ID lines in it. Making the change in this PR and adding the new getOtherHeaderLine(String key, String id)
kind of codifies that, though, and it feels like that's going in the wrong direction. An alternative fix would be to stop putting the ID lines in there at all, and store them elsewhere, and expose them through the new method. That would have the same backward compatibility issue as this fix, but it would also eliminate the need for the synthetic key names, which might be confusing since they look a lot like the ALT SV ID type hierarchy.
If we're going to break compatibility to fix this, I'd be inclined to favor doing the more complete fix though it's likely to be a bit more work and might impact other methods. Or maybe we should just resurrect the old PR and wait for that. @lbergelson any thoughts ? Having said all that, we should fix this somehow...
An alternative fix would be to stop putting the ID lines in there at all, and store them elsewhere, and expose them through the new method.
A nested map is one possibility for this. Where the outer map is from line type ("ALT", "PEDIGREE", etc) to an inner map, and the inner map is ID to metadata line. This would scale for any reasonable amount of line types, without having to hard code them (defining mAltMetaData
map).
This would require a similar change to methods, but without synthetic key names.
Would add a new map at the top:
private final Map<String, Map<String, VCFSimpleHeaderLine>> mIdMetaData = new LinkedHashMap<String, Map<String, VCFSimpleHeaderLine>>();
Adding a new line would involve checking for the key
, (making a new inner map if necessary), and adding to the inner map at the id
hash.
which would change the get
method I added to:
/**
* @param key the header key or field type
* @param id the header id
* @return the meta data line, or null if there is none
*/
public VCFSimpleHeaderLine getIdHeaderLine(final String key, final String id) {
return mIdMetaData.get(key).get(id);
}
(probably on multiple lines with a null check in there)
Currently, any ID header line other than INFO/FORMAT (i.e., ALT, META, SAMPLE, PEDIGREE, or any other custom line) can be retrieved via getOtherHeaderLine(String key). I think with this change that will no longer be the case.
With the above changes, could possibly also change the existing getOtherHeaderLine
to add some backwards compatibility. This could involve first checking mOtherMetaData
for the key
, and if not found, check mIdMetaData
for that key
. If the key
is found in the outer map of mIdMetaData
, take the first (or last) value in the inner map and cast to VCFHeaderLine
.
I made the changes above. I think now the only backwards compatibility issue is that getOtherHeaderLines()
will not return VCFSimpleHeaderLines
, where it had before. To get those, you use getIdHeaderLines()
After further discussing this internally and thinking more about it, I'm inclined to wait for a more complete fix to VCFHeader and address the numerous issues all at once rather than continue trying to make incremental fixes. I've resurrected and rebased my old #835 branch and will resubmit a new draft PR so we can try to find a path forward and get that merged.
Description
Closes #277 Closes #500
Currently, it is not possible to add multiple
VCFSimpleHeaderLine
s to a VCFHeader if they are of the same type (see issues linked above). A specific example, is addingALT
lines, which are included in 4.2 spec (that shows a header with multiple ALT lines on page 11).To fix this, I added a new map and methods for working with
SimpleVCFHeaderLine
s that are not of the default types.I have seen another fix to this (#835), but it was closed due to large scope. I am hoping this change is small and useful enough to be considered. Thanks!
Things to think about before submitting: