mcmero / SVclone

A computational method for inferring the cancer cell fraction of tumour structural variation from whole-genome sequencing data.
BSD 3-Clause "New" or "Revised" License
40 stars 10 forks source link

Interruption at cluster step #29

Closed sainadfensi closed 1 year ago

sainadfensi commented 1 year ago

Hi Marek, Thanks for the development and maintenance of SVclone.

I've tried filtering with and without having CNV input, and both get the same error when running the following cluster step.

code: svclone cluster -s $sample --snvs $out -cfg $config -o Output/$sample printout:


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
Missing column: subclonal_cn1. Set subclonal_cn as frac_cn1_sub1 < 1
Missing column: subclonal_cn2
                    Set subclonal_cn as frac_cn2_sub1 < 1
Running VB-Normal-Binomial on a 45-by-1 data with 1 clusters ...
Converged in 362 steps.
Running VB-Normal-Binomial on a 45-by-1 data with 3 clusters ...
(... I removed similar messages about converging ..)

Converged in 891 steps.
null device
          1
Traceback (most recent call last):
  File "/usr/local/python/3.6.5/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2898, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'ref'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/RDS-FMH-PopPCaGenomics-RW/Jue/Tools/SVclone/sv_clone-env/bin/svclone", line 8, in <module>
    sys.exit(main())
  File "/scratch/RDS-FMH-PopPCaGenomics-RW/Jue/Tools/SVclone/sv_clone-env/lib/python3.6/site-packages/SVclone/cli.py", line 187, in main
    args.func(args)
  File "/scratch/RDS-FMH-PopPCaGenomics-RW/Jue/Tools/SVclone/sv_clone-env/lib/python3.6/site-packages/SVclone/run_clus.py", line 209, in run_clustering
    format_snvs_for_ccube(snv_df, sample_params, cluster_params, cc_file)
  File "/scratch/RDS-FMH-PopPCaGenomics-RW/Jue/Tools/SVclone/sv_clone-env/lib/python3.6/site-packages/SVclone/run_clus.py", line 77, in format_snvs_for_ccube
    sup, dep, Nvar, norm_cn = load_data.get_snv_vals(df, cparams)
  File "/scratch/RDS-FMH-PopPCaGenomics-RW/Jue/Tools/SVclone/sv_clone-env/lib/python3.6/site-packages/SVclone/load_data.py", line 34, in get_snv_vals
    n = df['ref'].map(float).values
  File "/usr/local/python/3.6.5/lib/python3.6/site-packages/pandas/core/frame.py", line 2906, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/usr/local/python/3.6.5/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2900, in get_loc
    raise KeyError(key) from err
KeyError: 'ref'  

However, there are outputs in ccube_out/ folder:

├── sample_assignment_probability_table.txt ├── sample_ccube_sv_results.pdf ├── sample_ccube_sv_results.RData ├── sample_cluster_certainty.txt ├── sample_multiplicity.txt └── sample_subclonal_structure.txt

All the SVs clustered to a single cluster shown in sample_assignment_probability_table.txt.

mcmero commented 1 year ago

This means your input SNVs file doesn't have a ref column (reference counts). Can you post a sample of your $out file?.

sainadfensi commented 1 year ago

Hi Marek,

Thank you for the prompt response and sorry for my late reply. It turned out I was doing wrong for using the output file of the previous filtering step instead of using a VCF file for the --snvs flag. All the steps are processed smoothly.

Can you help me quick check the result of two example running (./run_example.sh, ./run_example_wsnvs.sh). I only see one cluster for the two examples. Is that correct?

mcmero commented 1 year ago

You should get one cluster for the SV results, and two clusters for the SNV results.

sainadfensi commented 1 year ago

Oh right, there are two clusters for SNVs. Thanks!