Closed ShixiangWang closed 6 years ago
Hi Shixiang,
To speed up your processing I would suggest to limit the list of alleles to only those that match the HLA type of each sample. The determine the HLA types of your samples you can run an HLA typing software like OptiType or HLAminer.
You can also start multiple pVACseq runs in parallel for your individual samples.
To respond to your individual questions:
1) We don't currently parallelize the iedb predictions for the fasta subsets. This is on our to-do list but is further down on our priorities list.
2) The default --fasta-size
is 200. If you are using a local IEDB install, the files don't need to necessarily be subset so you can increase this value to a higher number, if desired. I'm not sure this will speed things up very much though.
3) The --downstream-sequence-length
is set to 1000 by default. This value is really up to you. For a frameshift the whole downstream tail is novel so ideally you would want to make predictions for all of the epitopes in the downstream sequence. However, the longer the sequence, the longer it takes IEDB to make predictions for it. We've found that 1000 is a good number that still returns results in a reasonable amount of time while containing a large number of novel epitopes. You can reduce this number if you are ok with potentially missing some novel epitopes.
4) The biggest time save will be to reduce the number of alleles. Assuming 6 class I alleles for a person's HLA type you would now be making 6 calls to IEDB instead of 78 so your processing time would decrease by 90% (assuming everything else stays the same).
I can not agree with you more. Thanks @susannasiebert .
Hello,
I want to ask for help about how to speed up the pvacseq computation.
I use
NetMHC
method,9
epitope length and all valid HLAs forNetMHC
method, it costs about 1 hour to finish a sample. I have hundreds of samples to process, thus I have to speed up my computation.I read http://pvactools.readthedocs.io/en/latest/pvacseq/frequently_asked_questions.html about how to speed up and I have following questions:
pvacseq
split variants into multiple files after transforming vcf to tsv file as default but run one by one.--fasta-size
? There seems no default value and I use local installation of IEDB, thus I should set bigger fasta-size? I use9
epitope length, how big the size should I set will be appropriate?--downstream-sequence-length
? There seems also no default value, how big should I set will be appropriate for9
epitope length?Best wishes, Shixiang
YAML file: