uclahs-cds / package-moPepGen

Multi-Omics Peptide Generator
https://uclahs-cds.github.io/package-moPepGen/
GNU General Public License v2.0
5 stars 1 forks source link

ValueError: No reference edge was found. #639

Closed lydiayliu closed 1 year ago

lydiayliu commented 1 year ago

Not sure if this is new after #632, but this sample was not run before

      [ 2022-12-31 21:15:08 ] Using GVF file: CPCG0462.gencode.tsv.s_7.gvf
      [ 2022-12-31 21:15:08 ] Using GVF file: CPCG0462.gencode.tsv.s_5.gvf
      [ 2022-12-31 21:15:09 ] Variants sorted
      [ 2022-12-31 21:15:59 ] [ !!! moPepGen WARNING !!! ] Hypermutated region detected from graph: 'ENST00000377263.6'. The argument max_variants_per_node = 7 and additional_variants_per_misc = 2 was used to reduce complexity.
      [ 2022-12-31 21:16:13 ] 1000 transcripts processed.
      [ 2022-12-31 21:16:44 ] Exception raised from CIRC-ENST00000430580.6-E25-E26-E27-E28
      Traceback (most recent call last):
        File "/usr/local/bin/moPepGen", line 8, in <module>
          sys.exit(main())
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/__main__.py", line 89, in main
          args.func(args)
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 353, in call_variant_peptide
          results = [wrapper_local(dispatches[0])]
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 252, in wrapper_local
          return call_variant_peptides_wrapper(*dispatch)
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 232, in call_variant_peptides_wrapper
          _peptides = call_peptide_circ_rna(
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 485, in call_peptide_circ_rna
          cgraph.fit_into_codons()
        File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/ThreeFrameTVG.py", line 1593, in fit_into_codons
          end_nodes = self.expand_alignments(cur)
        File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/ThreeFrameTVG.py", line 1523, in expand_alignments
          ref_node = start.get_reference_next()
        File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/TVGNode.py", line 309, in get_reference_next
          raise ValueError('No reference edge was found.')
      ValueError: No reference edge was found.

    Work dir:
      work/ae/f3467c5c9d7d4a905d2ed00f095066

    Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

  Work dir:
    /hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/CPCGENE/processed/noncanonical-database/call-nonCanonicalPeptide/work_pipe/2f/24d997850502e9bf2bf94a7fed5ee2
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gindel/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gindel/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gindel/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gindel/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gindel/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gindel/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gindel/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gindel/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gindel/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gindel/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gsnp/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/pindel/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/somaticsniper/CPCG0462.gencode.tsv.s.gvf
zhuchcn commented 1 year ago

Was the reditools gvf file generated by a older version of mpg? The reference nucleotide of the variant records in the gvf file doesn't match with the reference.

lydiayliu commented 1 year ago

It's old but not sure how old XD I'll do it again

zhuchcn commented 1 year ago

The version in the GVF file is 0.2.0. I remember that we fixed some bugs in parseREDItools but can't be sure what that was exactly.

image

lydiayliu commented 1 year ago

Still getting the same error

Command wrapper:
      [ 2023-01-05 17:39:50 ] moPepGen callVariant started
      [ 2023-01-05 17:43:33 ] Reference indices loaded.
      [ 2023-01-05 17:45:06 ] Using GVF file: CPCG0462_ijc5_sjc5.s_4.gvf
      [ 2023-01-05 17:45:06 ] Using GVF file: CPCG0462.s_2.gvf
      [ 2023-01-05 17:45:06 ] Using GVF file: CPCG0462_candidates.rmsk.GRCh38_annotated.txt.mincd30_3.gvf
      [ 2023-01-05 17:45:06 ] Using GVF file: CPCG0462.gencode.tsv.s_5.gvf
      [ 2023-01-05 17:45:06 ] Using GVF file: CPCG0462.gencode.tsv.s_6.gvf
      [ 2023-01-05 17:45:06 ] Using GVF file: CPCG0462.gencode.tsv.s_7.gvf
      [ 2023-01-05 17:45:06 ] Using GVF file: CPCG0462_circularRNA_known.txt.1.s_1.gvf
      [ 2023-01-05 17:45:06 ] Using GVF file: CPCG0462.gencode.tsv.s_8.gvf
      [ 2023-01-05 17:45:06 ] Variants sorted
      [ 2023-01-05 17:46:01 ] [ !!! moPepGen WARNING !!! ] Hypermutated region detected from graph: 'ENST00000377263.6'. The argument max_variants_per_node = 7 and additional_variants_per_misc = 2 was used to reduce complexity.
      [ 2023-01-05 17:46:21 ] 1000 transcripts processed.
      [ 2023-01-05 17:46:53 ] Exception raised from CIRC-ENST00000430580.6-E25-E26-E27-E28
      Traceback (most recent call last):
        File "/usr/local/bin/moPepGen", line 8, in <module>
          sys.exit(main())
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/__main__.py", line 89, in main
          args.func(args)
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 353, in call_variant_peptide
          results = [wrapper_local(dispatches[0])]
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 252, in wrapper_local
          return call_variant_peptides_wrapper(*dispatch)
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 232, in call_variant_peptides_wrapper
          _peptides = call_peptide_circ_rna(
        File "/usr/local/lib/python3.8/site-packages/moPepGen/cli/call_variant_peptide.py", line 489, in call_peptide_circ_rna
          cgraph.fit_into_codons()
        File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/ThreeFrameTVG.py", line 1593, in fit_into_codons
          end_nodes = self.expand_alignments(cur)
        File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/ThreeFrameTVG.py", line 1523, in expand_alignments
          ref_node = start.get_reference_next()
        File "/usr/local/lib/python3.8/site-packages/moPepGen/svgraph/TVGNode.py", line 309, in get_reference_next
          raise ValueError('No reference edge was found.')
      ValueError: No reference edge was found.

    Work dir:
      work/66/880e2118ba78cdb722b9154ee6cefa

    Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

  Work dir:
    /hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/CPCGENE/processed/noncanonical-database/call-nonCanonicalPeptide/work_pipe/85/666ac6234ca909a3f1384d0d0ab406

  Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

The REDITools GVF has been updated

yiyangliu@ip-0A125215:/hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/CPCGENE/processed/noncanonical-database/call-nonCanonicalPeptide/log$ head /hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/REDItools/CPCG0462_candidates.rmsk.GRCh38_annotated.txt.mincd30.gvf
##fileformat=VCFv4.2
##mopepgen_version=0.10.1
##parser=parseREDItools
##reference_index=/hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/ref/GRCh38-EBI-GENCODE34/index
##genome_fasta=
##annotation_gtf=
##source=reditools
##CHROM=<Description="Gene ID">
##INFO=<ID=TRANSCRIPT_ID,Number=1,Type=String,Description="Transcript ID">
##INFO=<ID=GENE_SYMBOL,Number=1,Type=String,Description="Gene Symbol">

Here are all the GVFs

/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/CIRCexplorer2/CPCG0462_circularRNA_known.txt.1.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/Fusion/star-fusion-1.9.1/CPCG0462.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/REDItools/CPCG0462_candidates.rmsk.GRCh38_annotated.txt.mincd30.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/RMATS/CPCG0462_ijc5_sjc5.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gindel/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/gsnp/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/pindel/CPCG0462.gencode.tsv.s.gvf
/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/VEP/gencode/somaticsniper/CPCG0462.gencode.tsv.s.gvf

This is the old REDITools GVF on this transcript

yiyangliu@ip-0A125215:/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/REDItools/v0.2.0$ grep ENST00000430580.6 CPCG0462*.s.gvf
ENSG00000219481.10      27973   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16585590;STRAND=-1
ENSG00000219481.10      27974   RES-A-G A       G       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16585589;STRAND=-1
ENSG00000219481.10      28029   RES-C-G C       G       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16585534;STRAND=-1
ENSG00000219481.10      30827   RES-A-T A       T       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16582736;STRAND=-1
ENSG00000219481.10      36176   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16577387;STRAND=-1
ENSG00000219481.10      46367   RES-C-A C       A       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16567196;STRAND=-1
ENSG00000219481.10      46372   RES-G-C G       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16567191;STRAND=-1
ENSG00000219481.10      46379   RES-A-G A       G       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16567184;STRAND=-1
ENSG00000219481.10      47140   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16566423;STRAND=-1
ENSG00000219481.10      47782   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16565781;STRAND=-1
ENSG00000219481.10      47783   RES-C-T C       T       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16565780;STRAND=-1
ENSG00000219481.10      47784   RES-A-T A       T       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16565779;STRAND=-1
ENSG00000219481.10      49384   RES-C-T C       T       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16564179;STRAND=-1
ENSG00000219481.10      50275   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563288;STRAND=-1
ENSG00000219481.10      50276   RES-T-A T       A       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563287;STRAND=-1
ENSG00000219481.10      50290   RES-G-A G       A       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563273;STRAND=-1
ENSG00000219481.10      50412   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563151;STRAND=-1
ENSG00000219481.10      50416   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563147;STRAND=-1
ENSG00000219481.10      50419   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563144;STRAND=-1
ENSG00000219481.10      50434   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563129;STRAND=-1ENSG00000219481.10      50462   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563101;STRAND=-1
ENSG00000219481.10      50492   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563071;STRAND=-1
ENSG00000219481.10      50545   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563018;STRAND=-1ENSG00000219481.10      50555   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563008;STRAND=-1ENSG00000219481.10      50556   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563007;STRAND=-1
ENSG00000219481.10      50592   RES-A-C A       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16562971;STRAND=-1
ENSG00000219481.10      50644   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16562919;STRAND=-1ENSG00000219481.10      50645   RES-T-C T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16562918;STRAND=-1

This is the new GVF (I went for the more filtered version)

yiyangliu@ip-0A125215:/hot/project/disease/ProstateTumor/PRAD-000051-MIAPEP/Parser/REDItools$ grep ENST00000430580.6 CPCG0462*.gvf
ENSG00000219481.10      27973   RES-27973-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16585590;STRAND=-1
ENSG00000219481.10      27974   RES-27974-A-G   A       G       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16585589;STRAND=-1
ENSG00000219481.10      28029   RES-28029-C-G   C       G       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16585534;STRAND=-1
ENSG00000219481.10      30827   RES-30827-A-T   A       T       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16582736;STRAND=-1
ENSG00000219481.10      46367   RES-46367-C-A   C       A       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16567196;STRAND=-1
ENSG00000219481.10      46372   RES-46372-G-C   G       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16567191;STRAND=-1
ENSG00000219481.10      46379   RES-46379-A-G   A       G       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16567184;STRAND=-1
ENSG00000219481.10      47140   RES-47140-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16566423;STRAND=-1
ENSG00000219481.10      47782   RES-47782-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16565781;STRAND=-1
ENSG00000219481.10      47783   RES-47783-C-T   C       T       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16565780;STRAND=-1
ENSG00000219481.10      47784   RES-47784-A-T   A       T       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16565779;STRAND=-1
ENSG00000219481.10      49384   RES-49384-C-T   C       T       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16564179;STRAND=-1
ENSG00000219481.10      50275   RES-50275-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563288;STRAND=-1
ENSG00000219481.10      50276   RES-50276-T-A   T       A       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563287;STRAND=-1
ENSG00000219481.10      50290   RES-50290-G-A   G       A       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563273;STRAND=-1
ENSG00000219481.10      50412   RES-50412-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563151;STRAND=-1
ENSG00000219481.10      50416   RES-50416-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563147;STRAND=-1
ENSG00000219481.10      50419   RES-50419-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563144;STRAND=-1
ENSG00000219481.10      50434   RES-50434-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563129;STRAND=-1
ENSG00000219481.10      50462   RES-50462-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563101;STRAND=-1
ENSG00000219481.10      50492   RES-50492-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563071;STRAND=-1
ENSG00000219481.10      50545   RES-50545-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563018;STRAND=-1
ENSG00000219481.10      50555   RES-50555-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563008;STRAND=-1
ENSG00000219481.10      50556   RES-50556-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16563007;STRAND=-1
ENSG00000219481.10      50592   RES-50592-A-C   A       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16562971;STRAND=-1
ENSG00000219481.10      50644   RES-50644-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16562919;STRAND=-1
ENSG00000219481.10      50645   RES-50645-T-C   T       C       .       .       TRANSCRIPT_ID=ENST00000430580.6;GENOMIC_POSITION=chr1:16562918;STRAND=-1
zhuchcn commented 1 year ago

Can you show me the path to the original reditools output?

lydiayliu commented 1 year ago

It is in this folder: /hot/project/method/AlgorithmDevelopment/ALGO-000074-moPepGen/CPCGENE/processed/rnaedit/annotated

The whole situation is a bit complicated because the reditools calls were originally on GRCh37 and were lifted over, and then you had to run some annotations on them, but this is the "final" product that I use to make the GVFs

zhuchcn commented 1 year ago

Ah, I remember that now. Did I do this? It could be that the position of the rna editing somehow got mismatched. Will look into it.

lydiayliu commented 1 year ago

I thought I did it. I think the code is here: https://github.com/uclahs-cds/project-MissingPeptides-Method/pull/21#pullrequestreview-879625630. But then you did the double checking and annotations: https://github.com/uclahs-cds/project-MissingPeptides-Method/issues/12#issuecomment-1031846155

Btw these same 0.2.0 GVFs ran before with CPCG, so we are finding a previously un-addressed problem?

zhuchcn commented 1 year ago

The issue is because the coordinate of the RNA editings somehow got messed up, so for a variant for example, RES-27973-T-C, the reference nucleotide at position 27973 is in fact C, so both the reference node and the variant node has the same nucleotide (C). And in certain cases for example, if this is the stop codon, then the variant node should be kept. This then results none of the variant in the bubble is reference node. It's hard to tell way it did work before. Maybe it was because node collapsing was not introduced back then, but the RNA editing files are still problematic.

lydiayliu commented 1 year ago

Interesting, so can these entries be removed during parsing?

zhuchcn commented 1 year ago

Let me find out whether it was liftover or parseREDITools that causes the positions being mismatched.

zhuchcn commented 1 year ago

It turns out that the REDItools output is stranded. The reference nucleotide it has in its column No. 3 is the one in the gene. I thought it was the nucleotide on reference genome (unstranded). That's why I'm seeing half of the variant records parsed from the reditools output don't match.. 🤮