Closed sandain closed 2 years ago
After looking over the code some more, it might just be as simple as this:
diff --git a/ScanNeo.py b/ScanNeo.py
index 3313916..f19f94f 100755
--- a/ScanNeo.py
+++ b/ScanNeo.py
@@ -297,7 +297,7 @@ def vcf_renewer(in_vcf, out_vcf, ref="hg38", slippage=False, config=config):
chrm = record.CHROM
pos = record.POS
end = record.INFO["END"]
- if chrm in chrms:
+ if chrm in chrms or chrm in chrms_dict:
if sv_type == "INS":
alt = str(record.ALT[0])
if repeat_checker(alt[1:]):
While my test run is still going, the pre_vcf
file is no longer empty. I think that means this simple patch was enough to at least get me a little further down the line. I'll go ahead and submit this patch.
With this patch, I'm getting a new error:
Traceback (most recent call last):
File "/mnt/store3/clustcrilab/tools/scanneo/bin/ScanNeo.py", line 1011, in <module>
main()
File "/mnt/store3/clustcrilab/tools/scanneo/bin/ScanNeo.py", line 930, in main
cutoff=args.cutoff,
File "/mnt/store3/clustcrilab/tools/scanneo/bin/ScanNeo.py", line 376, in vep_caller
csq_allele = alleles_dict[alt]
TypeError: unhashable type: '_Substitution'
So either my above patch was not enough, or there is an additional issue..
I think the new error is a separate issue. Here is the patch I'm testing:
diff --git a/ScanNeo.py b/ScanNeo.py
index f19f94f..72f8df0 100644
--- a/ScanNeo.py
+++ b/ScanNeo.py
@@ -372,7 +372,7 @@ def vep_caller(in_vcf, out_vcf, vep_folder, cutoff=0.01, ref="hg38"):
vcf_writer = vcf.Writer(open(f"{out_vcf}", "w"), vcf_reader)
for record in vcf_reader:
alleles_dict = ScanNeo_utils.resolve_alleles(record)
- alt = record.ALT[0]
+ alt = str(record.ALT[0])
csq_allele = alleles_dict[alt]
transcripts = ScanNeo_utils.parse_csq_entries_for_allele(
record.INFO["CSQ"], csq_format, csq_allele
That second patch got me further, but with a new error:
Traceback (most recent call last):
File "/mnt/store3/clustcrilab/tools/scanneo/bin/ScanNeo.py", line 1011, in <module>
main()
File "/mnt/store3/clustcrilab/tools/scanneo/bin/ScanNeo.py", line 930, in main
cutoff=args.cutoff,
File "/mnt/store3/clustcrilab/tools/scanneo/bin/ScanNeo.py", line 380, in vep_caller
transcript_one = transcripts[0]
IndexError: list index out of range
Here is a patch for that last error message. I'm now able to complete the anno
phase without issue. You can find all three patches in PR #19.
diff --git a/ScanNeo.py b/ScanNeo.py
index 72f8df0..1a79834 100644
--- a/ScanNeo.py
+++ b/ScanNeo.py
@@ -377,6 +377,8 @@ def vep_caller(in_vcf, out_vcf, vep_folder, cutoff=0.01, ref="hg38"):
transcripts = ScanNeo_utils.parse_csq_entries_for_allele(
record.INFO["CSQ"], csq_format, csq_allele
)
+ if not transcripts:
+ continue
transcript_one = transcripts[0]
AF = float(transcript_one["AF"]) if transcript_one["AF"] else 0.0
try:
Fixed with PR #19.
The mm39 data set that I am working with uses the values used as keys in the
chrms_dict
hash as the chromosome identifiers rather than the values in thechrms
array. I believe thatchrms_dict
is unused, and because of this theanno
phase fails to produce an output vcf beyond the header. It appears to be a problem in thevcf_renewer
function, as thepre_vcf
file has the same issue as the output file.I've looked at the code, and I'm not sure where it would be best to use
chrms_dict
to get thevcf_renewer
function working (see line 300). I'm going to play around with some options and will submit a patch if I get something that works.