griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
137 stars 59 forks source link

NNAlign prediction error occurs when running with other prediction methods. #928

Closed ndeng1 closed 1 year ago

ndeng1 commented 1 year ago

Describe the bug When using NNAlign prediction alone in pVACbind, the results are consistent with those obtained from IEDB online prediction. However, incorporating other algorithms such as "all", "class_ii", or "NNAlign MHCNuggetII" leads to significant changes in the NNAlign output.

To Reproduce

Generate a sample file.

echo -e '>test\nIKSGGGSEKKKGLMTLSKMIKKKKNL' > test.fasta

Run the NNalign only

singularity exec --bind `realpath .` ~/software/pvactools/pvactools_3.1.1.sif pvacbind run --iedb-install-directory /opt/iedb -e2 13 --n-threads 20 `realpath test.fasta` test DRB1*07:01 NNalign  `realpath NNalign`

Here is the output

No MHC class I prediction algorithms chosen. Skipping MHC class I predictions.
Executing MHC Class II predictions
Splitting FASTA into smaller chunks
Splitting FASTA into smaller chunks - Entries 1-1
Completed
Making binding predictions on Allele DRB1*07:01 and Epitope Length 13 with Method NNalign - File /rsrch3/scratch/ccp-rsch/ndeng1/problem/NNalign/MHC_Class_II/tmp/test.nn_align.DRB1*07:01.13.tsv_1-1
Making binding predictions on Allele DRB1*07:01 and Epitope Length 13 with Method NNalign - File /rsrch3/scratch/ccp-rsch/ndeng1/problem/NNalign/MHC_Class_II/tmp/test.nn_align.DRB1*07:01.13.tsv_1-1 - Completed
Parsing binding predictions for Allele DRB1*07:01 and Epitope Length 13 - Entries 1-1
Parsing prediction file for Allele DRB1*07:01 and Epitope Length 13 - Entries 1-1
Parsing prediction file for Allele DRB1*07:01 and Epitope Length 13 - Entries 1-1 - Completed
Combining Parsed Prediction Files
Completed
Creating aggregated report
Completed
Calculating Manufacturability Metrics
Completed
Running Binding Filters
Completed
Running Top Score Filter
Completed

Done: Pipeline finished successfully. File NNalign/MHC_Class_II/test.filtered.tsv contains list of filtered putative neoantigens.

Check the NNalign Score of peptide GGGSEKKKGLMTL

awk -F '\t' 'NR==1 || $4 =="GGGSEKKKGLMTL" {print $2,$4,$11,$12}' NNalign/MHC_Class_II/test.all_epitopes.tsv

The NNalign Score obtained from predicting with MHC Nuggets II is 10935.8, which matches the result obtained from http://tools.iedb.org/mhci.

HLA Allele Epitope Seq NNalign Score NNalign Percentile
HLA-DRB1*07:01 GGGSEKKKGLMTL 10935.8 87.0

Run the NNalign together with MHC Nuggets II.

singularity exec --bind `realpath .` ~/software/pvactools/pvactools_3.1.1.sif pvacbind run --iedb-install-directory /opt/iedb -e2 13 --n-threads 20 `realpath test.fasta` test DRB1*07:01 MHCnuggetsII NNalign  `realpath MHCnuggetsII_NNalign`

Log

No MHC class I prediction algorithms chosen. Skipping MHC class I predictions.
Executing MHC Class II predictions
Splitting FASTA into smaller chunks
Splitting FASTA into smaller chunks - Entries 1-1
Completed
Making binding predictions on Allele DRB1*07:01 and Epitope Length 13 with Method NNalign - File /rsrch3/scratch/ccp-rsch/ndeng1/problem/MHCnuggetsII_NNalign/MHC_Class_II/tmp/test.nn_align.DRB1*07:01.13.tsv_1-1
Making binding predictions on Allele DRB1*07:01 and Epitope Length 13 with Method MHCnuggetsII - File /rsrch3/scratch/ccp-rsch/ndeng1/problem/MHCnuggetsII_NNalign/MHC_Class_II/tmp/test.MHCnuggetsII.DRB1*07:01.13.tsv_1-1

Making binding predictions on Allele DRB1*07:01 and Epitope Length 13 with Method NNalign - File /rsrch3/scratch/ccp-rsch/ndeng1/problem/MHCnuggetsII_NNalign/MHC_Class_II/tmp/test.nn_align.DRB1*07:01.13.tsv_1-1 - Completed
Making binding predictions on Allele DRB1*07:01 and Epitope Length 13 with Method MHCnuggetsII - File /rsrch3/scratch/ccp-rsch/ndeng1/problem/MHCnuggetsII_NNalign/MHC_Class_II/tmp/test.MHCnuggetsII.DRB1*07:01.13.tsv_1-1 - Completed
Parsing binding predictions for Allele DRB1*07:01 and Epitope Length 13 - Entries 1-1
Parsing prediction file for Allele DRB1*07:01 and Epitope Length 13 - Entries 1-1
Parsing prediction file for Allele DRB1*07:01 and Epitope Length 13 - Entries 1-1 - Completed
Combining Parsed Prediction Files
Completed
Creating aggregated report
Completed
Calculating Manufacturability Metrics
Completed
Running Binding Filters
Completed
Running Top Score Filter
Completed

Done: Pipeline finished successfully. File MHCnuggetsII_NNalign/MHC_Class_II/test.filtered.tsv contains list of filtered putative neoantigens.

Check the NNalign Score of peptide GGGSEKKKGLMTL

awk -F '\t' 'NR==1 || $4 =="GGGSEKKKGLMTL" {print $2,$4,$13,$14}' MHCnuggetsII_NNalign/MHC_Class_II/test.all_epitopes.tsv

The NNalign Score obtained from the prediction with MHC Nuggets II is 13.1. This result remains consistent whether we use the "all" or "class_ii" option.

HLA Allele Epitope Seq NNalign Score NNalign Percentile
DRB1*07:01 GGGSEKKKGLMTL 13.1 1.4
susannasiebert commented 1 year ago

Hi @ndeng1,

Thank you for the detailed bug report. A PR to fix this issue is now up (#945). I will make a hotfix in the next couple of days.

susannasiebert commented 1 year ago

A hotfix (3.1.2) for this issue is now out. I'm closing this issue but please do reopen it if you continue running into this problem.