raphael-group / THetA

Tumor Heterogeneity Analysis (THetA) and THetA2 are algorithms that estimate the tumor purity and clonal/subclonal copy number aberrations directly from high-throughput DNA sequencing data. This repository includes the updated algorithm, called THetA2.
http://compbio.cs.brown.edu/projects/theta/
70 stars 33 forks source link

Long run time reported. Run does not end. #22

Open jatintalwar opened 5 years ago

jatintalwar commented 5 years ago

Hello,

i am using THetA on output from cnvkit. My command is:

''' RunTHetA --FORCE --NUM_PROCESSES 16 -d pipeline/ pipeline/HER2_sample1_theta2_interval_count.txt '''

And i get the following warnings:

Reading in query file... Frac with potential copy numbers: 0.72678358498 Selecting intervals... Selected 100 intervals for analysis. Preprocessing data... Calculating bounds using bound heuristic... Writing bounds file to pipeline/HER2_sample1_theta2_interval_count.n2.withBounds Estimating time... Estimated Total Time: 10 minute(s) Performing optimization... Writing results file to pipeline/HER2_sample1_theta2_interval_count.n2.results Plotting results as a .pdf... Writing script to run N=3 to pipeline/HER2_sample1_theta2_interval_count.RunN3.bash Frac with potential copy numbers: 0.72678358498 Selecting intervals... WARNING: This sample isn't a good candidate for THetA analysis with 3 subpopulations: There aren't a sufficient number of intervals that fit the criteria for interval selection. Selected 75 intervals for analysis. Preprocessing data... Writing bounds file to pipeline/HER2_sample1_theta2_interval_count.n3.withBounds Estimating time... Estimated Total Time: 1523840161958445973504 hour(s) Performing optimization...

the last estimated total run time seems to be a bug.

Attached you can find a input file i use.

Do you know why is this issue bieng caused ? is there a suggestion to use THetA on WGS rather than on panel data ?

thanks. HER2_sample1_theta2_interval_count.txt

zhouyangyu commented 4 years ago

@jatintalwar you should use the BAF model. Without SNP data, the runtime is infinite. See https://github.com/etal/cnvkit/issues/146.