Rerunning the pipeline from pVACseq

icbi-lab / nextNEOpi

nextNEOpi: a comprehensive pipeline for computational neoantigen prediction

Other

67 stars 24 forks source link

Rerunning the pipeline from pVACseq #39

Closed mantczakaus closed 1 year ago

mantczakaus commented 1 year ago

Hi,

I was wondering if there is a way to run the nextNEOpi pipeline just from the pVACseq command? I want to run it first with standard and then relaxed filtering. I changed the setting in params.config and used the resume option with lenient cache but it went all the way back to alignment. I would appreciate your input!

Best wishes,

Magda

riederd commented 1 year ago

Unfortunately there is no option to set a specific entry point into the pipeline at this time. However, it will be so with nextNEOpi2 which we started to develop.

Meanwhile, did you try to pass the relaxed filtering via command line using --pVACseq_filter_set relaxed instead of changing the config file?

You could also use the unfiltered calls (*_MHC_Class_[I,II]_all_epitopes_ccf_ref_match.tsv) and filter them according to your needs using awk, excel, R, pandas or similar.

here is an example:

#!/bin/bash

# Filters
#
declare RANK=0.5
declare TUMOR_DNA_VAF=0.2
declare TUMOR_RNA_VAF=0.2
declare GENE_EXPR=1
declare TRANSCRIPT_SUPPORT_LEVEL=1

cat example_MHC_Class_I_all_epitopes_ccf_ref_match.tsv | \
        awk \
        -v RANK="$RANK" \
        -v TUMOR_DNA_VAF="$TUMOR_DNA_VAF" \
        -v TUMOR_RNA_VAF="$TUMOR_RNA_VAF" \
        -v GENE_EXPR="$GENE_EXPR" \
        -v TRANSCRIPT_SUPPORT_LEVEL="$TRANSCRIPT_SUPPORT_LEVEL" \
        'BEGIN{OFS=FS="\t"}{ if(NR==1) {print} ; if( $28 < RANK && $31 > TUMOR_DNA_VAF && $33 > TUMOR_RNA_VAF && $36 > GENE_EXPR && $7 <= TRANSCRIPT_SUPPORT_LEVEL) { print }}' \
        > example_MHC_Class_I_filtered_epitopes_ccf_ref_match.tsv

HTH

mantczakaus commented 1 year ago

Hi @riederd ,

thanks for your response!

could you expand a little bit on this point? "Meanwhile, did you try to pass the relaxed filtering via command line using --pVACseq_filter_set relaxed instead of changing the config file?". I tried to change the option in the .command.sh file of the corresponding work directory for one of the allelese but that triggered everything again from the alignment after resuming the pipeline. I guess I could just rerun those commands for each allele outside of the pipeline, just manually using the pvac image. The downside of that is we would need to keep all the pVAC intermediary files from the work directory until we decide on the filtering options.

I'm not sure if we could use the second solution you provided because the pvacseq run applies four filtering procedures. The last one selects one neoepitope per variant. If I understand right this is not included in that solution. https://pvactools.readthedocs.io/en/latest/pvacseq/output_files.html#filters-applied-to-the-filtered-tsv-file https://pvactools.readthedocs.io/en/latest/pvacseq/filter_commands.html#filter-commands

What I came up in the end is to take the file with all epitopes from analyses/patient1/12_pVACseq folder (I tried with this one patient1_MHC_Class_I_all_epitopes_ccf_ref_match.tsv but the numbers seem to be strings not integers). Then I split it by HLA (to keep one neoepitope per variant and HLA which is currently the case in nextNEOpi as the all neoepitopes are filtered separately for HLAs). Then I manually apply the filtering starting with pvacseq binding_filter and ending with pvacseq top_score_filter. Then I concatenate the files. This way I was able to go from patient1_MHC_Class_I_all_epitopes.tsv to patient1_MHC_Class_I_filtered_epitopes.tsv. Any thoughts on this?

Best wishes,

Magda

mantczakaus commented 1 year ago

Meanwhile, did you try to pass the relaxed filtering via command line using --pVACseq_filter_set relaxed instead of changing the config file?

You could also use the unfiltered calls (*_MHC_Class_[I,II]_all_epitopes_ccf_ref_match.tsv) and filter them according to your needs using awk, excel, R, pandas or similar.

Hi, in your paper (Supplementary Table 6) you mentioned you were able to recover 8 out of 9 neoepitopes from patient1 from the TESLA dataset. Would you be able to clarify if you achieve that by setting relaxed filtering in the params.config like this

  pVACseq_filter_set = "relaxed"
  pVACseq_filter_sets {
    standard = "--binding-threshold 500 --top-score-metric median --minimum-fold-change 0.0 --normal-cov 5 --tdna-cov 10 --trna-cov 10 --normal-vaf 0.02 --tdna-vaf 0.25 --trna-vaf 0.25 --expn-val 1 --maximum-transcript-support-level 1"
    relaxed = "--binding-threshold 500 --percentile-threshold 2 --top-score-metric lowest --expn-val 2 --maximum-transcript-support-level 5 --normal-vaf 0.01 --trna-vaf 0.02 --tdna-vaf 0.02"
    custom = "${params.pVACseq_custom_filters}"
  }

or did you use something like the bash script you provided here?

riederd commented 1 year ago

Hi, as far as I recall my former lab colleague who ran the TESLA analysis used the following options:

--trim_adapters true --trim_adapters_RNAseq true --use_NetChop false --use_NetMHCstab true --pVACseq_filter_set relaxed

For Supplementary Table 6 she used a custom filter: Best MT Score <= 500 Best MT Percentile <= 2 Gene Expression <= 2

Please keep also in mind, that it was run with older versions of the pipeline and especially with older versions of the tools therein.

And sorry for the late reply, I was out of office....

Let me know if it answers you question and if I may close the issue.

mantczakaus commented 1 year ago

Hi @riederd thank you for the clarification. I was not able to reproduce the results with the settings you provided but it may be as you said because of the older versions of tools. I think it's ok to close this issue.

Just fyi, these are the results I got:

patient_id	tesla_validated_in_filtered_supt6	tesla_validated_in_filtered_rel
1	8	2
2	4	3
3	12	8
12	3	0
16	1	0

when i used the following filter settings:

parameter	relaxed	supt6
binding-threshold	500	500
percentile-threshold	2	2
top-score-metric	lowest	lowest
minimum-fold-change	0	0
normal-cov	5	0
tdna-cov	10	0
trna-cov	10	0
normal-vaf	0.01	1
tdna-vaf	0.02	0
trna-vaf	0.02	0
expn-val	2	2
maximum-transcript-support-level	5	5