griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
145 stars 59 forks source link

Four prediction results(Score, Percentile) in a row not correspond to the same epitope #950

Closed RysBen closed 1 year ago

RysBen commented 1 year ago

Describe the bug Hi there,

I encountered a difficult feature when using pvacbind. The *parsed.tsv file can aggregate four algorithm prediction results, but the prediction results are not exactly the same epitope in the Epitope Seq column (An example was provided in To Reproduce).

Why can't output four epitope to Epitope Seq, especially if their sequences are not the same?

Thanks, Rys

To Reproduce 1) Make a test file, test.fa.

>F1
HRLREEILAKFLHWLMSVYVV

2) Run the following command.

pvacbind run \
    test.fa \
    TEST \
    DRB1*09:01 \
    all ./ \
    -m median \
    --netmhc-stab \
    -t 20 \
    -k \
    -e2 18 \
    --iedb-install-directory $iedb_path \    # replace path
    --blastp-path $blastp_path \                 # replace path
    --blastp-db $refseq_select_prot            # replace path

3) An example

# 3a) Epitope in *parsed.tsv was LREEILAKFLHWLMSVYV
grep -E "LREEILAKFLHWLMSVYV|Mutation" MHC_Class_II/tmp/TEST.DRB1*09:01.18.parsed.tsv_1-1|column -t
#Mutation  HLA             Allele  Sub-peptide         Position  Epitope  Seq           Median  Score  Best         Score    Best  Score    Method Median  Percentile  Best    Percentile  Best  Percentile  Method  MHCnuggetsII  Score  MHCnuggetsII  Percentile  NetMHCIIpan  Score  NetMHCIIpan  Percentile  NNalign  Score  NNalign  Percentile  SMMalign  Score  SMMalign  Percentile
#F1        DRB1*09:01      3       LREEILAKFLHWLMSVYV  2303.69   2303.69  MHCnuggetsII  NA      NA     NA           2303.69  NA    NA       NA      NA      NA          NA      NA
#F1        HLA-DRB1*09:01  -6      LREEILAKFLHWLMSVYV  1289.49   468.1    NNalign       45.0    23.0   NetMHCIIpan  NA       NA    1289.49  23.0    468.1   45.0        3944.0  65.0

# 3b) peptide in *tsv_1-1 was HRLREEILAKFLHWLMSV, not same with *parsed.tsv 
grep -E "allele|3944.0" TEST.smm_align.DRB1*09:01.18.tsv_1-1
#allele seq_num start   end length  core_peptide    peptide ic50    percentile_rank adjusted_rank
#HLA-DRB1*09:01 1   1   18  18  LAKFLHWLM   HRLREEILAKFLHWLMSV  3944.00 65.0    192.38
susannasiebert commented 1 year ago

I believe this issue is a duplicate of #928 and has been fixed in 3.1.2. Please upgrade to the latest version and give it another try.

RysBen commented 1 year ago

v3.1.2 works on the test data, thanks @susannasiebert