griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
137 stars 59 forks source link

How are MHC class I and II predictions combined in pvacseq's combined output? #470

Closed kylec closed 4 years ago

kylec commented 4 years ago

I ran pvacseq with NetMHCpan and NetMHCpanII. The combined filtered output (295 lines) has is less than the lines in MHC class I and II combined (194 + 271 lines). 295 combined/sample.filtered.tsv 194 MHC_Class_I/sample.filtered.tsv 271 MHC_Class_II/sample.filtered.tsv

What kind of filtering is done when Class I and II output are combined?

susannasiebert commented 4 years ago

For the combined report, the class I and class II all_epitopes files are combined and then the exact same filters are run. Since one of the filters is the top score filter, the combined filtered file might contain less entries then the class I and class II filtered files combined. This happens if there are class I and class II epitopes for the same variant passing all of the filters. In that case the best class I or class II epitope of a variant is picked in the combined filtered file by the top score filter and the other is discarded even though it is present in the individual class I and class II filtered files.

Please also note that there was a bug with the pVACtools wrappers for netchop and netmhcstabpan in version <=1.5.2 that would erroneously discard filtered epitopes. If you are using those tools, I would recommend rerunning with version 1.5.3.

malachig commented 4 years ago

In general, our own workflow involves running class I and II separately so that one has more flexibility in interpreting them after the fact.

nicoleversetwo commented 1 year ago

Is there a way to manually filter this the same way? Using the docker version of pvacseq, during the stage where the files are combined and filtered, the job often stalls out or is auto-killed (see below)

Done: Pipeline finished successfully. File /opt/iedb/pvacseq_example_data/***-1768/MHC_Class_II/TUMOR.filtered.tsv contains list of filtered putative neoantigens.

Creating combined reports Creating aggregated report Tumor clonal VAF estimated as 0.6 (estimated from Tumor DNA VAF data). Assuming variants with VAF < 0.3 are subclonal Killed root@368e993f4f9a:/opt/iedb#

susannasiebert commented 1 year ago

You can run the standalone filter commands on the all_epitopes.tsv file in the combined directory. Please see the documentation for more info.

You might need to use a machine with more memory if the aggregate report creation is failing. How big is your all_epitopes.tsv file? Which version of pVACtools are you using?

nicoleversetwo commented 1 year ago

Thank you!

The all epitopes file is about 200MB. It seemed like the filtering process was taking between 2-4 hours, but it was only completing 50% of the time.

I'm using the desktop version of docker and griffithlab/pvactools:latest-slim

susannasiebert commented 1 year ago

Can you try with version 3.1.1? We made some improvements to the aggregate report step that should speed things up but your docker client might've cached an older version. You can explicitly use this version by referencing 3.1.1-slim instead of latest-slim.