Closed ahwanpandey closed 5 years ago
All variants that are supported by pVACseq will be written to an intermediate
Thanks @susannasiebert !
I have a few more questions for you. I am new to the neoantigen prediction landscape so hopefully my questions don't sound too ignorant.
1) I am running HLA-VBSeq to HLAtype the patient data. It seems HLA-VBSeq only outputs HLA-A, HLA-B, HLA-C, DQA1, DQB1 and DRB1 types. Will I be missing potential neoantigens if I am not including say HLA-E, HLA-F, DPA1, DPB1 etc.. ? I am also thinking of including HLAminer results which seems to support more Genes.
2) HLA-VBSeq can output up to 8 digit resolution HLA-types. So far for my test I am feeding pVACSeq 4 digit resolution HLA-types. I noticed if I go up to 6-digits then a lot of them are incompatible. Is is safe to just stick with 4 digits to get a meaningful result?
3) Does it suffice to annotate the variants with the VEP flags "--coding_only" and "--no_intergenic"? And then only feed pVACSeq with the variants with a CSQ annotation? I ask this because the RNA count addition step is faster if I can feed it this smaller list of variants that are supported by pVACSeq rather than all the somatic variants found. Is there any harm in doing this?
1) I hope that one of our Bioinformaticians can chime in but I don't believe we run HLA-E, HLA-F, or DPA in our immunotherapy pipeline. Most of the algorithms don't support them anyway so you're probably ok leaving them off
2) Yes, 4 digits is sufficient since most, if not all, prediction algorithms in pVACseq only go to that resolution anyway
3) We haven't tried running our annotation with those flags but that should work. Alternatively you can grep your VCF for entries with missense_variant
, inframe_insertion
, inframe_deletion
, protein_altering_variant
, or frameshift_variant
. Those are all of the consequence types pVACseq supports.
Hey @susannasiebert
Thank you.
I’m not exactly sure what you’re asking. It doesn’t really matter which HLA types can be predicted by an HLA typing tool or to what resolution if those HLA types aren’t supported by the prediction algorithms pVACseq supports. That list is available using the pvacseq valid_alleles
command.
I think I've answered your question so I will close this issue but feel free to reopen or make a new issue for any additional questions/problems you might have.
According to the following statement:
How do I find the actual number of variants that pVACseq is using downstream for the prediction?