pdxgx / neoepiscope

predicts neoepitopes from phased somatic mutations detected using tumor/normal DNA-seq data
Other
26 stars 17 forks source link

No neoepitopes found. #5

Closed tarini92 closed 5 years ago

tarini92 commented 5 years ago
root@a231da4504f2:~/notebooks/neoepiscope# neoepiscope call -x ~/neoepiscope.data/hg19 -d ~/neoepiscope.data/gencode_v29/ -c adjusted_hap_som_germ.out -a HLA-A -o tests/output_prediction_somatic_germline.out
No neoepitopes found

root@92c193bba70f:~/notebooks/neoepiscope# neoepiscope call -x ~/neoepiscope.data/hg19 -d ~/neoepiscope.data/gencode_v29/ -c tests/adjusted_ychrom_hap.out -a HLA-A -o tests/output_ychrome.out
No neoepitopes found

root@92c193bba70f:~/notebooks/neoepiscope# neoepiscope call -x tests/Ychrom.ref -d ~/neoepiscope.data/gencode_v29/ -c tests/adjusted_ychrom_hap.out -a HLA-A -o tests/output_ychrome.out
No neoepitopes found

root@92c193bba70f:~/notebooks/neoepiscope# neoepiscope call -x tests/Chr11.ref -d ~/neoepiscope.data/gencode_v29/ -c adjusted_chr11_hap.out -a HLA-A -o tests/output_ychrome.out
No neoepitopes found

Above are various trials of different haplotypes assembly outputs, that I've ran with the same output of no neoepitopes. It's possible I might be missing a step, so I'll run through the steps:

  1. First I took a test vcf (test.vcf/Ychrome_varscan.vcf), either separately or after merging it with germline.vcf. (Combining germline and somatic variants).
  2. Prepped the already present test hapcut outputs to include unphased variants. (test_hapcut.out/complete_hapcut.out)
  3. Ran neoeopitope calling with the reference genome bowtie indices and dicts.

Now, the assumption is this case is not manually running the Haplotype phasing and taking the already given sample outputs. Where am I be going wrong?

maryawood commented 5 years ago

From what you're describing, it sounds like you're running things correctly! Normally this message is returned when the variants input to neoepiscope don't lead to any amino acid changes, or lead to amino acid changes that generate peptide kmers identical to other portions of the normal protein. Is it possible that this is the case with your test VCFs? If you're sure that the variants should be producing amino acid changes and you're able/willing to share your VCFs with me, I'd be happy to run some tests and see if I can diagnose the problem.

maryawood commented 5 years ago

After re-reviewing this issue, I've realized that you were using incompatible genome builds between the bowtie index and GTF-derived pickled dictionary files. The VCFs were generated using hg19 - could you try using ~/neoepiscope.data/hg19 for the -x argument and ~/neoepiscope.data/gencode_v19/ for the -d argument for the first example command you listed above and let me know if that changes anything? For me, using those two worked to generate neoepitope predictions.

tarini92 commented 5 years ago

Hi, As you correctly pointed out, it was the incompatibility in the reference genome dicts and indices files. It runs properly, now. Found 3 bugs earlier on, when I was running the code, line numbers 431, 459 and 466 in init.py.

431, the variable intervals_dict is not defined yet, it's defined in the if cause, which goes untraversed, if we're in the else clause. Locally, I changed it to intervals_path. 459, join([args.bowtie_index, ".", str(x), ".ebwt"], it would be a syntactical error if x is an integer, and not made a string.

On Tue, Mar 26, 2019 at 10:11 PM Mary Wood notifications@github.com wrote:

After re-reviewing this issue, I've realized that you were using incompatible genome builds between the bowtie index and GTF-derived pickled dictionary files. The VCFs were generated using hg19 - could you try using ~/neoepiscope.data/hg19 for the -x argument and ~/neoepiscope.data/gencode_v19/ for the -d argument for the first example command you listed above and let me know if that changes anything? For me, using those two worked to generate neoepitope predictions.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pdxgx/neoepiscope/issues/5#issuecomment-476737987, or mute the thread https://github.com/notifications/unsubscribe-auth/ATpQkw_eF_4LPkxIEvhXzQu7J0HYya8_ks5vak2ugaJpZM4cGQFt .

tarini92 commented 5 years ago

Results state the paired normal epitope to be NA. For all the listed neoepitopes in the result. Could it because of the reference genome different than expected? I am taking the hg19 reference genome with the test VCF's and Haplotype output.

Neoepitope      Chromsome       Pos     Ref     Alt     Mutation_type   VAF     Paired_normal_epitope   Warnings        Transcript_ID   mhcflurry1_HLA-C03:03_affinity  mhcflurry1_HLA-C03:03_rank
CGGSKGDCGSW     chr11   71276863        GT      *       D       NA      NA      NA      ENST00000398531.1       34614.31465996793       62.87025000000001
CGSWGLQR        chr11   71276863        GT      *       D       NA      NA      NA      ENST00000398531.1       34571.5601350282        59.95950000000001
CGSWGLQRG       chr11   71276863        GT      *       D       NA      NA      NA      ENST00000398531.1       31282.89680636645       39.07300000000001
CGSWGLQRGL      chr11   71276863        GT      *       D       NA      NA      NA      ENST00000398531.1       28758.67352940209       27.539125
CGSWGLQRGLW     chr11   71276863        GT      *       D       NA      NA      NA      ENST00000398531.1       36642.92804815618       78.78237500000002
DCGSWGLQ        chr11   71276863        GT      *       D       NA      NA      NA      ENST00000398531.1       37518.02716488922       85.37950000000002
maryawood commented 5 years ago

Thanks for pointing out that error, I will put out a new minor release addressing this today! Regarding the paired normal epitopes being 'NA' for the epitopes you listed, we currently only support paired normal epitopes for neoepitopes derived from SNVs, so any neoepitope derived from an insertion or deletion will not have a paired normal epitope reported.

tarini92 commented 5 years ago

Ah, yes. That makes sense. Though, I did not specifically state the flags for indels in case of preping the HAPCUT output. I thought the default is set for SNV's. In any case, I will inspect further. Thanks.

maryawood commented 5 years ago

Ah, that's an interesting point that we hadn't considered! Currently the prep mode of the software adds all unphased variants in the VCF in as their own haplotypes, but it would be nice to have a command line option to exclude indels during prep for users that only want to focus on SNVs. Something for us to add in an upcoming release, thanks for bringing this to our attention!