griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
137 stars 59 forks source link

TSL level not reported (and used as a filter) in all_epitope.tsv/aggregated.tsv #1044

Closed PierreLaplante closed 7 months ago

PierreLaplante commented 8 months ago

Installation Type

Docker

pVACtools Version / Docker Image

4.0.6

Python Version

3.11.0

Operating System

CentOs 7 HPC

Describe the bug

The transcript support level is not reported and used as filtering with input VEP annotated vcf containing TSL field.

How to reproduce this bug

VEP command used:
vep \
    --species mus_musculus --assembly GRCm38 --no_stats --buffer_size 5000 --sift b \
    --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --terms SO \
    --canonical --protein --biotype --uniprot --tsl --variant_class \
    --shift_hgvs 1 --check_existing --total_length --allele_number \
    --no_escape --xref_refseq --failed 1 --vcf --flag_pick_allele --transcript_version \
    --pick_order canonical,tsl,biotype,rank,ccds,length \
    --dir $HOME/.vep/ \
    --cache --fasta 102_GRCm38/Mus_musculus.GRCm38.dna.primary_assembly.fa \
    --format vcf \
    --plugin Frameshift --plugin Wildtype \
    --dir_plugins $HOME/vep_plugins/ \
    --input_file input.vcf\
    --output_file input_vep.vcf \
    --offline --pubmed --fork 4 --regulatory --verbose \

pvacseq command used :

singularity exec \
    --mount type=bind,src=SOURCE,dst=DEST \
    pvac406.sif pvacseq run \
    input.vcf.gz \
    id \
    H-2-Kd,H-2-Dd,H-2-Ld \
    NetMHCpan \
    output/dir \
    --iedb-install-directory /opt/iedb \
    --net-chop-method cterm \
    --netmhc-stab \
    --run-reference-proteome-similarity --blastp-path ncbi/dir --blastp-db refseq_select_prot \
    -a sample_name \
    --phased-proximal-variants-vcf phased.vcf.gz \
    --tdna-vaf 0.05 \
    --trna-vaf 0.05 \

Input files

input vep_vcf, TSL field is present and full: input.vcf.gz

Log output

No logs.

Output files

output all_epitope.tsv/aggregated.tsv showing Not Supported: output.all_epitopes.aggregated.txt

(truncated for upload size) output.all_epitopes.txt

susannasiebert commented 8 months ago

@PierreLaplante thank you for bringing this to our attention. When we first implemented TSL, it was only supported for GRCh38. I haven't been able to find any documentation about which species are supported these days so I made a bugfix PR to add GRCm38 to the list. There might be other species that also support TSL but since I couldn't find any confirmation I'm leaving it at human and mouse for now. This fix should go out with the next bugfix release either this week or next.

PierreLaplante commented 7 months ago

Is it possible for you guys to add a command to add TSL to already existing .all_epitope.tsv by crossing with the corresponding vep annotated vcf? So that already long ran analysis can be annotated without rerunning the whole prediction. Thank you.

susannasiebert commented 7 months ago

I don't really see a utility for such a command in the long run after this fix goes live. My suggestion would be to start a second run from scratch but abort it after the initial TSV gets created. That tsv will contain an index column that can be used to associate its entries with those from your all_eptiopes.tsv (using the Index column in the all_epitopes.tsv file). You can then use that to fix the TSL entries using a custom script.

PierreLaplante commented 7 months ago

I see, that makes sense, thank you for the suggestion.

susannasiebert commented 7 months ago

This issue should be fixed in version 4.0.7. Please give it a try and let me know if you're still running into this problem.