seqan / iGenVar

The official repository for the iGenVar project.
BSD 3-Clause "New" or "Revised" License
9 stars 8 forks source link

[FIX] Add duplicated sequences to output and fix vcf output #208

Closed Irallia closed 2 years ago

Irallia commented 2 years ago

This PR adds the duplicated sequence to the output. To discuss: What is the length of a tandem duplication. In this PR I decided to use the length of the duplicated sequence. For example:

ref  NNNNN AAACCCGGG AAACCCGGG NNNNNN
read NNNNN AAACCCGGG AAACCCGGG AAACCCGGG AAACCCGGG AAACCCGGG NNNNNN

Than the duplication goes from (5, 23], its duplicated sequence is AAACCCGGG and the SVLEN is 9.

codecov[bot] commented 2 years ago

Codecov Report

Merging #208 (91ec8dc) into master (71ad200) will decrease coverage by 0.00%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #208      +/-   ##
==========================================
- Coverage   98.35%   98.34%   -0.01%     
==========================================
  Files          18       18              
  Lines         850      848       -2     
==========================================
- Hits          836      834       -2     
  Misses         14       14              
Impacted Files Coverage Δ
src/iGenVar.cpp 100.00% <ø> (ø)
...sv_detection_methods/analyze_split_read_method.cpp 96.84% <100.00%> (+0.03%) :arrow_up:
src/variant_detection/variant_output.cpp 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 71ad200...91ec8dc. Read the comment docs.

Irallia commented 2 years ago

To discuss: What is the length of a tandem duplication.

I had a quick look at the specification (source) and I found that SVLEN = Difference in length between REF and ALT alleles Therefore, I would change the length to 27 in your example.

However, it feels a bit redundant to me that SVLEN is always END-POS, because it doesn't add information.

Yes, but the SVLEN is not used everywhere, I found a discussion about INVs here. Maybe we just introduce a second INFO value beside the SVLEN. I will adjust the PR.