jzook / testHG008curation

This repository is primarily for the Genome in a Bottle Consortium to curate structural variants in HG008
https://www.nist.gov/programs-projects/cancer-genome-bottle
1 stars 0 forks source link

[chr10:25515229-25515743][INS][SVLEN=514bp] #220

Open jzook opened 1 month ago

jzook commented 1 month ago

chr10 25515229 Minda_8 N <INS> . PASS SVLEN=514;SVTYPE=INS;SUPP_VEC=ONT_severus_INS10477,ONT_Sniffles2.INS.12EM9,ONT_i_32,ONT_ID_48081_1,PB_severus_INS9071,PB_Sniffles2.INS.130M9,PB_ID_18566_1,PB_i_88

https://v2.genomeribbon.com/?session=https://42basepairs.com/download/s3/giab-data/ribbon-json/ribbon-hg008-cov-bedpe.json&locus=chr10:25515229#ribbon

jo-mc commented 1 month ago

Inserts into a location annotated with L1PB1. Insertion is a mix of 160 bp of L1HS and 220bp of unrelated sequence, and poly-A regions. Interesting insert 228 is 6000 bp upstream.

Insertion Analysis:

  1. Initial Characteristics:
    • insert length ~512bp
    • Resembles a Line1-type insertion
    • Features a poly-A tail (and a poly-A mid insertion)
    • Possible target site duplication: "AAAAGCCTTGTC"
  2. Sequence Match:
    • the beginning portion of the insert matches known active repeat L1HS ~160bp at 5' end and a poly-A region ~30-40bp
    • the end portion, Approx 330bp, is potentially a transduction and has a poly-A tail of ~25bp.
    • However no matching line-1 and similar following sequence found.
  3. Contextual Information:
    • The first 160bp matches to multiple L1HS throughout the genome
    • The last ~330bp maps to chr14:52960785-52961100 (chm13v2) in an intron 3 of 13 in LINC01500-205, a long non-coding RNA
    • no target site deletion
  4. Genomic Location:
    • Insertion into a region having Line-1 L1PB1 annotated.
  5. Sequencing Data Observations:
    • Appears clonal (snp's indicate uneven haplotype counts)
    • Total coverage at insertion site of approximately 70 reads (pacbio) (illumina depth 120).
    • About 21 reads of 70 contain the insertion.
  6. Unresolved Questions:
    • Origin of this DNA fragment is unclear, how a piece of intron was attached to a Line-1 insertion.
    • insertion fragment #228 also comes from intron 3 of 13 in LINC01500-205, about 6000bp upstream.

image

splitthreader (2)

image

Align insert to chm13 using minimap2 splice:

shows alignment of end part to intron as above, and alignment of front part to L1HS multiple locations. (secondary) plus a few not so close positions.

@PG ID:minimap2 PN:minimap2 VN:2.26-r1175 CL:minimap2 -ax splice ../chm13v2.fa insert220.fa m84039_240114_012401_s1/262346381/ccs 0 chr14 52960780 60 180S156M3D115M1D52M19S 0 0 CAAAAGCCTTGTCATCTAG m84039_240114_012401_s1/262346381/ccs 2064 chr8 137538082 1 200H26M1I3M1D76M12I2M2I1M1I7M3I16M6D14M15937N114M1I10M1I5M1D11M16H m84039_240114_012401_s1/262346381/ccs 256 chr4 83046133 0 16S7M1I9M1D10M1I115M2D15M348S 0 0
m84039_240114_012401_s1/262346381/ccs 256 chr5 34275675 0 16S7M1I9M1D10M1I119M1D19M340S 0 0
m84039_240114_012401_s1/262346381/ccs 272 chr14 98509867 0 337S10M1D15M2D116M1I10M1I5M1D11M16S
0 0
m84039_240114_012401_s1/262346381/ccs 256 chr4 16930563 0 16S7M1I9M1D10M1I124M354S
0 0
m84039_240114_012401_s1/262346381/ccs 272 chr16 35817437 0 363S115M1I10M1I5M1D11M16S 0 0 *