neverending callVariant runs

lydiayliu commented 2 years ago

Currently known:

CPCG0241 CPCG0262

GVFs in here: /hot/project/algorithm/moPepGen/CPCGENE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/input_csvs/CPCG.csv

If it's really just a run time issue we can always exclude RNA-editing?

zhuchcn commented 2 years ago

CPCG0241:

moPepGen callVariant \
     --input-path \
        /hot/user/yiyangliu/MoPepGen/Parser/REDItools/CPCG0241_candidates.rmsk.GRCh38_annotated.txt.s.gvf \
        /hot/user/yiyangliu/MoPepGen/Parser/CIRCexplorer2/CPCG0241_circularRNA_known.txt.1.s.gvf \        /hot/user/yiyangliu/MoPepGen/Parser/VEP/gencode/gindel/CPCG0241.gencode.tsv.s.gvf \
        /hot/user/yiyangliu/MoPepGen/Parser/VEP/gencode/gsnp/CPCG0241.gencode.tsv.s.gvf \
        /hot/user/yiyangliu/MoPepGen/Parser/VEP/gencode/pindel/CPCG0241.gencode.tsv.s.gvf \
        /hot/user/yiyangliu/MoPepGen/Parser/VEP/gencode/somaticsniper/CPCG0241.gencode.tsv.s.gvf \
        /hot/user/yiyangliu/MoPepGen/Parser/RMATS/CPCG0241_ijc5_sjc5.s.gvf \
        /hot/user/yiyangliu/MoPepGen/Parser/Fusion/star-fusion-1.9.1/CPCG0241.s.gvf \
    --output-path CPCG0241_variant_peptides.fasta \
    --index-dir GRCh38-EBI-GENCODE34 \
    --min-length 7 \
    --cleavage-rule trypsin \
    --max-variants-per-node 7 \
    --min-mw 500.0 \
    --miscleavage 2 \
    --max-length 25 \
    --threads 16

One of the transcripts below is causing the problem.

[ 2022-06-06 20:10:35 ] ['ENST00000580133.2', 'ENST00000390284.2', 'ENST00000442890.1', 'ENST00000390285.4', 'ENST00000520107.1', 'ENST00000432623.1', 'ENST00000302273.2', 'ENST00000403807.4', 'ENST00000624350.1', 'ENST00000610778.1', 'ENST00000412149.1', 'ENST00000652112.1', 'ENST00000413293.1', 'ENST00000390290.3', 'ENST00000436538.1', 'ENST00000390294.2']

CPCG0262:

moPepGen callVariant \
    --input-path \
        /hot/user/yiyangliu/MoPepGen/Parser/REDItools/CPCG0262_candidates.rmsk.GRCh38_annotated.txt.s.gvf \
        /hot/user/yiyangliu/MoPepGen/Parser/CIRCexplorer2/CPCG0262_circularRNA_known.txt.1.s.gvf \
        /hot/user/yiyangliu/MoPepGen/Parser/VEP/gencode/gindel/CPCG0262.gencode.tsv.s.gvf \
        /hot/user/yiyangliu/MoPepGen/Parser/VEP/gencode/gsnp/CPCG0262.gencode.tsv.s.gvf \
        /hot/user/yiyangliu/MoPepGen/Parser/VEP/gencode/pindel/CPCG0262.gencode.tsv.s.gvf \
        /hot/user/yiyangliu/MoPepGen/Parser/VEP/gencode/somaticsniper/CPCG0262.gencode.tsv.s.gvf \
        /hot/user/yiyangliu/MoPepGen/Parser/RMATS/CPCG0262_ijc5_sjc5.s.gvf \
        /hot/user/yiyangliu/MoPepGen/Parser/Fusion/star-fusion-1.9.1/CPCG0262.s.gvf \
    --output-path CPCG0262_variant_peptides.fasta \
    --index-dir GRCh38-EBI-GENCODE34  \
    --min-length 7 \
    --cleavage-rule trypsin \
    --max-variants-per-node 7 \
    --min-mw 500.0 \
    --miscleavage 2 \
    --max-length 25 \
    --threads 16

[ 2022-06-06 20:10:35 ] ['ENST00000580133.2', 'ENST00000390284.2', 'ENST00000442890.1', 'ENST00000390285.4', 'ENST00000520107.1', 'ENST00000432623.1', 'ENST00000302273.2', 'ENST00000403807.4', 'ENST00000624350.1', 'ENST00000610778.1', 'ENST00000412149.1', 'ENST00000652112.1', 'ENST00000413293.1', 'ENST00000390290.3', 'ENST00000436538.1', 'ENST00000390294.2']

lydiayliu commented 2 years ago

On the third node: ip-0A125225

There's no log for CPCG0250: /scratch/aa/d8d9642363adeb9ed5ba7f56f57dbf/.command.log

Also no callVariant log for COCG0248: /scratch/34/e4b340f7f4fd910c037307b625b05b/.command.log

zhuchcn commented 2 years ago

The cause for CPCG0241 is that the transcript isn't large (117 amino acids), but it has a lot of RNA editing sites (38). The --max-variants-per-node limits the number of variants per node. But if three consecutive cleavage regions are all hypermutated, the number of combination is still huge. For example, if a node has 7 variants, its downstream cleavage region has 7, and the downstream of the downstream also has 7 variants, the total combinations of variants would be 2 ^ 21 = 2,097,152. I guess we could add a variable like --additional-variants-per-misc to specify the number of additional variants to consider per miscleavage. For example, if --max-variants-per-node = 7 and --additional-variants-per-misc = 2, then the the combinations we will considier would go down to 352,716. I'm trying to see how reasonable time it takes.

Btw, are the reditools result filtered?

zhuchcn commented 2 years ago

I tried to fix this in #477. By setting --additional-variants-per-misc to 2, CPCG0241 ENST00000390308.2 finished in 7 minutes, and CPCG0262 ENST00000390285.4 finished in 3 minutes. It's not perfect, but the only workaround that I can think of to let the program finish.

zhuchcn commented 2 years ago

Just found that location is missing in our RES labels. They currently like 'RES-C-G' but they should be 'RES-100-C-G'. Will have to fix it..

lydiayliu commented 2 years ago

sorry what does RES stand for?

zhuchcn commented 2 years ago

RES -> RNA editing sites

lydiayliu commented 2 years ago

Quoting Julie here for the REDITOOLS filtering:

If I remember correctly the sites have been filtered so that the RNA frequency > 0.1 (so 10% of the reads have the event), the event had 0 frequency in the DNA and the position has at least 10 reads coverage in RNA and DNA. The sites have also been annotated with repeatMasker and RefSeq gene ids.

We had some extensive discussions with filtering in https://github.com/uclahs-cds/private-moPepGen/issues/288 and https://github.com/uclahs-cds/private-moPepGen/issues/294

I used all defult params in moPepGen parseREDItools, which are these

optional arguments:
  --min-coverage-alt <number>
                        Minimal read coverage of alterations to be parsed. (default: 3)
  --min-frequency-alt <value>
                        Minimal frequency of alteration to be parsed. (default: 0.1)
  --min-coverage-rna <value>
                        Minimal read coverage at the alteration site of RNAseq data of reference and all alterations. (default: 10)
  --min-coverage-dna <number>
                        Minimal read coverage at the alteration site of WGS. Set it to -1 to skip checking this. (default: 10)

which gets a ~70k line GVF from a ~290k line input TXT file.

Though it seems like I ran the test with only REDItools using --max-variants-per-node 5. But if this is too much for mpg we can always change the min coverage to 30 for rna and dna, that should bring it down substantially?

zhuchcn commented 2 years ago

Yeah right, our default parameters are set based on the recommendation of their paper. Would it be useful to take a look at the distribution of RES number per transcript? Maybe some transcripts indeed have a lot of RES?

/hot/user/yiyangliu/MoPepGen/Parser/VEP/gencode/gindel/CPCG0262.gencode.tsv.s.gvf

You mean a previous test run? That explains why we didn't see this before.

lydiayliu commented 2 years ago

Hmm that GVF you are quoting is from VEP... I don't think I ever ran all of the CPCG REDItools through callVariant (maybe I should), but all the GVFs are here: /hot/users/yiyangliu/MoPepGen/Parser/REDItools/ They are in the 20k-100k lines range.

I ran a single filtered sample was CPCG0196. That does explain why we havn't seen this before.

I'll try to pull some stats in a bit

zhuchcn commented 2 years ago

Oh sorry that's funny. I was trying to quote your last sentence but maybe I got confused between ctrl+shift+c and ctrl+c.

Though it seems like I ran the test with only REDItools using --max-variants-per-node 5.

I ran a single filtered sample was CPCG0196. That does explain why we havn't seen this before.

That makes sense!

lydiayliu commented 2 years ago

I tallied the number of GVF entries per transcript and printed the top 5 from each sample in this file /hot/user/yiyangliu/MoPepGen/Parser/REDItools/gvf_entries_max_5_transcripts.txt

You can see that the usual culprits are pretty clear (ENST00000356079.9 I'm talking about you!!), going to 300+ variants

OMG it's actually the protein coding transcript of CYP20A1, is it known that it is so heavily RNA-edited???

Try this sample for an upper bound

CPCG0342_candidates.rmsk.GRCh38_annotated.txt.s.gvf
    114 ENST00000416600.6
    114 ENST00000428216.4
    116 ENST00000360845.3
    203 ENST00000233190.11
    362 ENST00000356079.9

Just found that location is missing in our RES labels. They currently like 'RES-C-G' but they should be 'RES-100-C-G'. Will have to fix it..

LMAO I also just noticed. Well it seems like we need to reparse XD I can increase the stringency of the filtering at the same time, do you think we should?

The last thing is that we can just do --max-variants-per-node 5 for all of CPCG!

zhuchcn commented 2 years ago

Oh I guess what is more useful is the distribution of the coverage of each rna editing site. But it does seem like there are transcripts been highly RNA edited. Might be good to look at the number of records when setting min coverage to 30?

The last thing is that we can just do --max-variants-per-node 5 for all of CPCG!

Sure!

lydiayliu commented 2 years ago

I'll try 30 along with fixing the IDs!

lydiayliu commented 2 years ago

sooooooo the interesting thing is, after going for --min-coverage-rna 30 and --min-coverage-dna 30, (--min-coverage-dna 30 alone didn't do too much), the top transcript still has 300+ variants in some samples

    316 ENST00000356079.9
    318 ENST00000356079.9
    328 ENST00000356079.9
    332 ENST00000356079.9
    335 ENST00000356079.9

There are fewer GVF entries now of course, but I've decided to run things through with just REDItools to see if everybody can finish in a reasonable time (also since i cant get any F72 node to run for me so the metapipeline is shot)

lydiayliu commented 2 years ago

update: it seems like the transcript with 300+ variants is not the one causing trouble

All samples up to CPCG0248 has run in like 2-3 minutes (wallclock at the end of callVariant), for CPCG0248 these transcripts have been running for 20+ minutes. So a crube tally of variants per transcript is not reflective of run time XD

[ 2022-06-09 15:41:05 ] ['ENST00000432883.5', 'ENST00000447208.6', 'ENST00000420290.6', 'ENST00000434570.6', 'ENST00000398042.6', 'ENST00000399807.7', 'ENST00000327374.9', 'ENST00000215730.12', 'ENST00000610778.1', 'ENST00000652112.1', 'ENST00000390290.3', 'ENST00000390299.2', 'ENST00000390306.2', 'ENST00000390308.2', 'ENST00000390312.2', 'ENST00000620395.2']

It probably has to do with how many variants are super close together, like all of these are within 300 nucleotides

The culprit is probably one of the below (I see only 1 CPU running)

yiyangliu@ip-0A125212:/hot/users/yiyangliu/MoPepGen/Parser/REDItools$ grep ENST00000390299.2 CPCG0248_candidates.rmsk.GRCh38_annotated.txt.mincd30.gvf
ENSG00000211653.2       321     RES-321-G-A     G       A       .       .       TRANSCRIPT_ID=ENST00000390299.2;GENOMIC_POSITION=chr22:22410086;STRAND=1
ENSG00000211653.2       323     RES-323-A-T     A       T       .       .       TRANSCRIPT_ID=ENST00000390299.2;GENOMIC_POSITION=chr22:22410088;STRAND=1
ENSG00000211653.2       372     RES-372-G-A     G       A       .       .       TRANSCRIPT_ID=ENST00000390299.2;GENOMIC_POSITION=chr22:22410137;STRAND=1
ENSG00000211653.2       373     RES-373-G-C     G       C       .       .       TRANSCRIPT_ID=ENST00000390299.2;GENOMIC_POSITION=chr22:22410138;STRAND=1
ENSG00000211653.2       376     RES-376-A-C     A       C       .       .       TRANSCRIPT_ID=ENST00000390299.2;GENOMIC_POSITION=chr22:22410141;STRAND=1
ENSG00000211653.2       378     RES-378-A-T     A       T       .       .       TRANSCRIPT_ID=ENST00000390299.2;GENOMIC_POSITION=chr22:22410143;STRAND=1
ENSG00000211653.2       386     RES-386-G-A     G       A       .       .       TRANSCRIPT_ID=ENST00000390299.2;GENOMIC_POSITION=chr22:22410151;STRAND=1
ENSG00000211653.2       395     RES-395-G-C     G       C       .       .       TRANSCRIPT_ID=ENST00000390299.2;GENOMIC_POSITION=chr22:22410160;STRAND=1
ENSG00000211653.2       415     RES-415-G-C     G       C       .       .       TRANSCRIPT_ID=ENST00000390299.2;GENOMIC_POSITION=chr22:22410180;STRAND=1
ENSG00000211653.2       431     RES-431-C-G     C       G       .       .       TRANSCRIPT_ID=ENST00000390299.2;GENOMIC_POSITION=chr22:22410196;STRAND=1
ENSG00000211653.2       450     RES-450-A-T     A       T       .       .       TRANSCRIPT_ID=ENST00000390299.2;GENOMIC_POSITION=chr22:22410215;STRAND=1
ENSG00000211653.2       461     RES-461-G-C     G       C       .       .       TRANSCRIPT_ID=ENST00000390299.2;GENOMIC_POSITION=chr22:22410226;STRAND=1
ENSG00000211653.2       479     RES-479-T-A     T       A       .       .       TRANSCRIPT_ID=ENST00000390299.2;GENOMIC_POSITION=chr22:22410244;STRAND=1
ENSG00000211653.2       484     RES-484-A-T     A       T       .       .       TRANSCRIPT_ID=ENST00000390299.2;GENOMIC_POSITION=chr22:22410249;STRAND=1

yiyangliu@ip-0A125212:/hot/users/yiyangliu/MoPepGen/Parser/REDItools$ grep ENST00000390306.2 CPCG0248_candidates.rmsk.GRCh38_annotated.txt.mincd30.gvf
ENSG00000211660.3       339     RES-339-G-C     G       C       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698127;STRAND=1
ENSG00000211660.3       421     RES-421-G-A     G       A       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698209;STRAND=1
ENSG00000211660.3       427     RES-427-A-G     A       G       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698215;STRAND=1
ENSG00000211660.3       428     RES-428-G-A     G       A       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698216;STRAND=1
ENSG00000211660.3       435     RES-435-C-T     C       T       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698223;STRAND=1
ENSG00000211660.3       453     RES-453-A-G     A       G       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698241;STRAND=1
ENSG00000211660.3       456     RES-456-G-A     G       A       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698244;STRAND=1
ENSG00000211660.3       458     RES-458-A-C     A       C       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698246;STRAND=1
ENSG00000211660.3       478     RES-478-C-G     C       G       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698266;STRAND=1
ENSG00000211660.3       483     RES-483-G-T     G       T       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698271;STRAND=1
ENSG00000211660.3       488     RES-488-A-T     A       T       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698276;STRAND=1
ENSG00000211660.3       493     RES-493-G-A     G       A       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698281;STRAND=1
ENSG00000211660.3       498     RES-498-T-C     T       C       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698286;STRAND=1
ENSG00000211660.3       501     RES-501-G-A     G       A       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698289;STRAND=1
ENSG00000211660.3       519     RES-519-T-C     T       C       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698307;STRAND=1
ENSG00000211660.3       520     RES-520-A-G     A       G       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698308;STRAND=1
ENSG00000211660.3       531     RES-531-T-C     T       C       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698319;STRAND=1
ENSG00000211660.3       539     RES-539-A-G     A       G       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698327;STRAND=1
ENSG00000211660.3       549     RES-549-C-A     C       A       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698337;STRAND=1
ENSG00000211660.3       567     RES-567-C-T     C       T       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698355;STRAND=1
ENSG00000211660.3       571     RES-571-G-A     G       A       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698359;STRAND=1
ENSG00000211660.3       572     RES-572-G-T     G       T       .       .       TRANSCRIPT_ID=ENST00000390306.2;GENOMIC_POSITION=chr22:22698360;STRAND=1

yiyangliu@ip-0A125212:/hot/users/yiyangliu/MoPepGen/Parser/REDItools$ grep ENST00000390312.2 CPCG0248_candidates.rmsk.GRCh38_annotated.txt.mincd30.gvf
ENSG00000211666.2       254     RES-254-T-C     T       C       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22758953;STRAND=1
ENSG00000211666.2       267     RES-267-A-G     A       G       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22758966;STRAND=1
ENSG00000211666.2       275     RES-275-C-T     C       T       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22758974;STRAND=1
ENSG00000211666.2       289     RES-289-C-G     C       G       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22758988;STRAND=1
ENSG00000211666.2       296     RES-296-T-C     T       C       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22758995;STRAND=1
ENSG00000211666.2       300     RES-300-G-A     G       A       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22758999;STRAND=1
ENSG00000211666.2       305     RES-305-T-C     T       C       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759004;STRAND=1
ENSG00000211666.2       306     RES-306-G-A     G       A       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759005;STRAND=1
ENSG00000211666.2       307     RES-307-G-C     G       C       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759006;STRAND=1
ENSG00000211666.2       309     RES-309-T-C     T       C       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759008;STRAND=1
ENSG00000211666.2       316     RES-316-A-C     A       C       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759015;STRAND=1
ENSG00000211666.2       337     RES-337-A-G     A       G       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759036;STRAND=1
ENSG00000211666.2       345     RES-345-A-G     A       G       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759044;STRAND=1
ENSG00000211666.2       347     RES-347-A-G     A       G       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759046;STRAND=1
ENSG00000211666.2       355     RES-355-A-C     A       C       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759054;STRAND=1
ENSG00000211666.2       357     RES-357-C-G     C       G       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759056;STRAND=1
ENSG00000211666.2       360     RES-360-A-C     A       C       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759059;STRAND=1
ENSG00000211666.2       367     RES-367-A-T     A       T       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759066;STRAND=1
ENSG00000211666.2       369     RES-369-G-C     G       C       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759068;STRAND=1
ENSG00000211666.2       370     RES-370-A-G     A       G       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759069;STRAND=1
ENSG00000211666.2       375     RES-375-A-C     A       C       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759074;STRAND=1
ENSG00000211666.2       395     RES-395-T-C     T       C       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759094;STRAND=1
ENSG00000211666.2       399     RES-399-A-G     A       G       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759098;STRAND=1
ENSG00000211666.2       405     RES-405-T-C     T       C       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759104;STRAND=1
ENSG00000211666.2       459     RES-459-G-C     G       C       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759158;STRAND=1
ENSG00000211666.2       482     RES-482-C-T     C       T       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759181;STRAND=1
ENSG00000211666.2       488     RES-488-C-T     C       T       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759187;STRAND=1
ENSG00000211666.2       500     RES-500-C-T     C       T       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759199;STRAND=1
ENSG00000211666.2       503     RES-503-C-T     C       T       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759202;STRAND=1
ENSG00000211666.2       504     RES-504-A-C     A       C       .       .       TRANSCRIPT_ID=ENST00000390312.2;GENOMIC_POSITION=chr22:22759203;STRAND=1

uclahs-cds / package-moPepGen

neverending callVariant runs #476