icbi-lab / nextNEOpi

nextNEOpi: a comprehensive pipeline for computational neoantigen prediction
Other
67 stars 24 forks source link

cnvkit fails on TESLA dataset patient1 and patient3 #27

Closed mantczakaus closed 1 year ago

mantczakaus commented 1 year ago

Hi, I am running the entire nextNEOpi pipeline on the TESLA dataset https://doi.org/10.1016/j.cell.2020.09.015 In your paper, it is mentioned you run it to validate the pipeline. Have you encountered any issues with cnvkit? I run it before on WES data of clear cell renal cell carcinoma tumor and matched normal (https://www.ebi.ac.uk/ena/browser/view/SAMEA4074323) and I did not encounter any issues. With patient1 and patient3 from TESLA I am getting the following error:

Command error:
    File "/usr/local/lib/python3.9/site-packages/pandas/core/computation/expressions.py", line 235, in evaluate
      return _evaluate(op, op_str, a, b)  # type: ignore[misc]
    File "/usr/local/lib/python3.9/site-packages/pandas/core/computation/expressions.py", line 69, in _evaluate_standard
      return op(a, b)
  TypeError: unsupported operand type(s) for -: 'int' and 'str'

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
      r = call_item.fn(*call_item.args, **call_item.kwargs)
    File "/usr/local/lib/python3.9/site-packages/cnvlib/batch.py", line 157, in batch_write_coverage
      cnarr = coverage.do_coverage(bed_fname, bam_fname, by_count, 0, processes, fasta)
    File "/usr/local/lib/python3.9/site-packages/cnvlib/coverage.py", line 27, in do_coverage
      cnarr = interval_coverages(bed_fname, bam_fname, by_count, min_mapq,
    File "/usr/local/lib/python3.9/site-packages/cnvlib/coverage.py", line 57, in interval_coverages
      table = interval_coverages_pileup(bed_fname, bam_fname, min_mapq,
    File "/usr/local/lib/python3.9/site-packages/cnvlib/coverage.py", line 170, in interval_coverages_pileup
      spans = table.end - table.start
    File "/usr/local/lib/python3.9/site-packages/pandas/core/ops/common.py", line 65, in new_method
      return method(self, other)
    File "/usr/local/lib/python3.9/site-packages/pandas/core/arraylike.py", line 97, in __sub__
      return self._arith_method(other, operator.sub)
    File "/usr/local/lib/python3.9/site-packages/pandas/core/series.py", line 4998, in _arith_method
      result = ops.arithmetic_op(lvalues, rvalues, op)
    File "/usr/local/lib/python3.9/site-packages/pandas/core/ops/array_ops.py", line 189, in arithmetic_op
      res_values = _na_arithmetic_op(lvalues, rvalues, op)
    File "/usr/local/lib/python3.9/site-packages/pandas/core/ops/array_ops.py", line 149, in _na_arithmetic_op
      result = _masked_arith_op(left, right, op)
    File "/usr/local/lib/python3.9/site-packages/pandas/core/ops/array_ops.py", line 91, in _masked_arith_op
      result[mask] = op(xrav[mask], yrav[mask])
  TypeError: unsupported operand type(s) for -: 'int' and 'str'
  """

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "/usr/local/bin/cnvkit.py", line 9, in <module>
      args.func(args)
    File "/usr/local/lib/python3.9/site-packages/cnvlib/commands.py", line 110, in _cmd_batch
      args.reference, args.targets, args.antitargets = batch.batch_make_reference(
    File "/usr/local/lib/python3.9/site-packages/cnvlib/batch.py", line 139, in batch_make_reference
      target_fnames = [tf.result() for tf in tgt_futures]
    File "/usr/local/lib/python3.9/site-packages/cnvlib/batch.py", line 139, in <listcomp>
      target_fnames = [tf.result() for tf in tgt_futures]
    File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 438, in result
      return self.__get_result()
    File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
      raise self._exception
  TypeError: unsupported operand type(s) for -: 'int' and 'str'

command.err attached err.txt

In terms of debugging, I started running the commands one by one in the depot.galaxyproject.org-singularity-cnvkit-0.9.9--pyhdfd78af_0.img image using the singularity options provided in the config files. CNVkit fails on the first command already which for me was:

cnvkit.py \
    batch \
    patient1_tumor_DNA_aligned_sort_mkdp.bam \
    --normal patient1_normal_DNA_aligned_sort_mkdp.bam \
    --method hybrid \
    --targets S07604514_Covered.bed \
    --fasta GRCh38.d1.vd1.fa \
    --annotate gencode.v33.primary_assembly.annotation.gtf \
    --access access-5kb.GRCh38.bed \
     \
    -p 16 \
    --output-reference output_reference.cnn \
    --output-dir ./

Would you happen to know where to start investigating this issue? Have you come across something similar? Alternatively, if I checked the nextNEOpi.nf file correctly the output channel of CNVkit (CNVkit_out_ch0) is not used anywhere else (if I understand correctly ploidy is being calculated using ASCAT and Sequenza output). And currently we are only interested in pVACseq results from SNVs/short INDELs, expression of the corresponding genes, maybe purity/ploidy, but not CNVs. If it's too difficult to investigate and make the CNVkit working, I would be keen not to use it for now but could you recommend the best practice to do that? I have not see any available settings in the config files that would take care of that.

riederd commented 1 year ago

Hi, so far I didn't see this error. We also ran the tesla data some time ago with a previous version of nextNEOpi. I would need to run them again to see if I can reproduce the issue.

A quick and dirty solution to skip CNVkit would be to add something like this around the CNVkit process on line https://github.com/icbi-lab/nextNEOpi/blob/aac260dbd5da701d22a09846045f3f42981cd4a1/nextNEOpi.nf#L4195

if (1 == 2) {
    process CNVkit { 
        .....
    }
}
mantczakaus commented 1 year ago

This worked as a workaround - thank you

riederd commented 1 year ago

closing this as we just released v1.4.0 with some software updates

Feel free to reopen in case you still see the error