ylab-hi / ScanNeo

A pipeline for identifying indel derived neoantigens using RNA-Seq data
Other
29 stars 7 forks source link

anno phase not working #18

Closed sandain closed 2 years ago

sandain commented 2 years ago

The mm39 data set that I am working with uses the values used as keys in the chrms_dict hash as the chromosome identifiers rather than the values in the chrms array. I believe that chrms_dict is unused, and because of this the anno phase fails to produce an output vcf beyond the header. It appears to be a problem in the vcf_renewer function, as the pre_vcf file has the same issue as the output file.

I've looked at the code, and I'm not sure where it would be best to use chrms_dict to get the vcf_renewer function working (see line 300). I'm going to play around with some options and will submit a patch if I get something that works.

sandain commented 2 years ago

After looking over the code some more, it might just be as simple as this:

diff --git a/ScanNeo.py b/ScanNeo.py
index 3313916..f19f94f 100755
--- a/ScanNeo.py
+++ b/ScanNeo.py
@@ -297,7 +297,7 @@ def vcf_renewer(in_vcf, out_vcf, ref="hg38", slippage=False, config=config):
         chrm = record.CHROM
         pos = record.POS
         end = record.INFO["END"]
-        if chrm in chrms:
+        if chrm in chrms or chrm in chrms_dict:
             if sv_type == "INS":
                 alt = str(record.ALT[0])
                 if repeat_checker(alt[1:]):
sandain commented 2 years ago

While my test run is still going, the pre_vcf file is no longer empty. I think that means this simple patch was enough to at least get me a little further down the line. I'll go ahead and submit this patch.

sandain commented 2 years ago

With this patch, I'm getting a new error:

Traceback (most recent call last):
  File "/mnt/store3/clustcrilab/tools/scanneo/bin/ScanNeo.py", line 1011, in <module>
    main()
  File "/mnt/store3/clustcrilab/tools/scanneo/bin/ScanNeo.py", line 930, in main
    cutoff=args.cutoff,
  File "/mnt/store3/clustcrilab/tools/scanneo/bin/ScanNeo.py", line 376, in vep_caller
    csq_allele = alleles_dict[alt]
TypeError: unhashable type: '_Substitution'

So either my above patch was not enough, or there is an additional issue..

sandain commented 2 years ago

I think the new error is a separate issue. Here is the patch I'm testing:

diff --git a/ScanNeo.py b/ScanNeo.py
index f19f94f..72f8df0 100644
--- a/ScanNeo.py
+++ b/ScanNeo.py
@@ -372,7 +372,7 @@ def vep_caller(in_vcf, out_vcf, vep_folder, cutoff=0.01, ref="hg38"):
         vcf_writer = vcf.Writer(open(f"{out_vcf}", "w"), vcf_reader)
         for record in vcf_reader:
             alleles_dict = ScanNeo_utils.resolve_alleles(record)
-            alt = record.ALT[0]
+            alt = str(record.ALT[0])
             csq_allele = alleles_dict[alt]
             transcripts = ScanNeo_utils.parse_csq_entries_for_allele(
                 record.INFO["CSQ"], csq_format, csq_allele
sandain commented 2 years ago

That second patch got me further, but with a new error:

Traceback (most recent call last):
  File "/mnt/store3/clustcrilab/tools/scanneo/bin/ScanNeo.py", line 1011, in <module>
    main()
  File "/mnt/store3/clustcrilab/tools/scanneo/bin/ScanNeo.py", line 930, in main
    cutoff=args.cutoff,
  File "/mnt/store3/clustcrilab/tools/scanneo/bin/ScanNeo.py", line 380, in vep_caller
    transcript_one = transcripts[0]
IndexError: list index out of range
sandain commented 2 years ago

Here is a patch for that last error message. I'm now able to complete the anno phase without issue. You can find all three patches in PR #19.

diff --git a/ScanNeo.py b/ScanNeo.py
index 72f8df0..1a79834 100644
--- a/ScanNeo.py
+++ b/ScanNeo.py
@@ -377,6 +377,8 @@ def vep_caller(in_vcf, out_vcf, vep_folder, cutoff=0.01, ref="hg38"):
             transcripts = ScanNeo_utils.parse_csq_entries_for_allele(
                 record.INFO["CSQ"], csq_format, csq_allele
             )
+            if not transcripts:
+                continue
             transcript_one = transcripts[0]
             AF = float(transcript_one["AF"]) if transcript_one["AF"] else 0.0
             try:
sandain commented 2 years ago

Fixed with PR #19.