raphael-group / THetA

Tumor Heterogeneity Analysis (THetA) and THetA2 are algorithms that estimate the tumor purity and clonal/subclonal copy number aberrations directly from high-throughput DNA sequencing data. This repository includes the updated algorithm, called THetA2.
http://compbio.cs.brown.edu/projects/theta/
70 stars 33 forks source link

Error message: IndexError: list index out of range #8

Open daidaobee opened 8 years ago

daidaobee commented 8 years ago

I have created the copy number input file, normal file as well as the tumor file. I followed the format according to the Manual instruction.

When I only run RunTHetA with just the copy number file, it seems to work out fine. However when I add the option for the tumor file and normal file, it gave me the error.

Please help? THanks

Below are the first few lines of my files.

CN-FILE

interval ID chrom start end tumorCount normalCount

1 1 1 1758057 281 313 2 1 1758057 4151407 333 660 3 1 4151407 4372509 352 627 4 1 4372509 4582450 221 191 5 1 4582450 4793866 420 577 6 1 4793866 5009810 354 600 7 1 5009810 5227840 290 386 8 1 5227840 5438636 261 408 9 1 5438636 5646411 300 160 10 1 5646411 6528721 254 461 11 1 6528721 6759914 344 601 12 1 6759914 6983038 360 523 13 1 6983038 7194330 309 633 14 1 7194330 7407624 246 68 15 1 7407624 7622684 208 143 16 1 7622684 7841261 385 786 17 1 7841261 8072429 325 841 18 1 8072429 8306737 287 330

NORMAL-FILE

Chrm Pos Ref_Allele Mut_Allele

17 19713740 0 37 16 28603393 0 31 15 74883710 16 3 9 95784648 281 1 1 213009469 0 26 3 49724639 40 4 6 70961833 37 33 5 1081767 105 85 10 106152111 15 8 14 102695693 20 16 1 151204762 47 4 16 19547747 0 33

TUMOR-FILE

Chrm Pos Ref_Allele Mut_Allele

17 19713740 0 56 16 28603393 1 53 15 74883710 22 0 9 95784648 156 59 1 213009469 0 37 3 49724639 58 10 6 70961833 101 44 5 1081767 92 59 10 106152111 28 22 14 102695693 11 10 1 151204762 59 0 16 19547747 1 50

Below is my command /bin/RunTHetA CN-FILE.txt --TUMOR_FILE TUMOR-FILE.txt --NORMAL_FILE NORMAL-FILE.tsv

Below is my error message

Arguments are: Query File: CN-FILE.txt k: 3 tau: 2 Output Directory: ./ Output Prefix: CN-FILE.txt Num Processes: 1 Graph extension: .pdf

Valid sample for THetA analysis: Ratio Deviation: 0.1 Min Fraction of Genome Aberrated: 0.05

Program WILL cluster intervals.

Reading in query file... Frac with potential copy numbers: 0.839205341782 Reading SNP file at TUMOR-FILE.txt Reading SNP file at NORMAL-FILE.tsv Reading interval file at CN-FILE.txt Calculating BAFs Determining heterozygosity. Calculating BAFs. Traceback (most recent call last): File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 505, in main() File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 283, in main resultsfile2, boundsfile2 = run_fixed_N(2, args, intervals) File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 317, in run_fixed_N intervals, missingData, corrRatio, meanBAFs = get_clustering_args(tumorfile, normalfile, filename, num_processes, m, tumorCounts, normCounts) File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 263, in get_clustering_args intervalsByChrm[chrm].append(interval) IndexError: list index out of range

gsatas commented 8 years ago

Hi Marco,

I believe that the error is being caused because the input file should be tab separated. From what you included here, it looks like your input is space separated. Let me know if the problem persists.

Best, Gryte Satas

On Thu, Feb 25, 2016 at 1:01 PM, Marco Leung notifications@github.com wrote:

I have created the copy number input file, normal file as well as the tumor file. I followed the format according to the Manual instruction.

When I only run RunTHetA with just the copy number file, it seems to work out fine. However when I add the option for the tumor file and normal file, it gave me the error.

Please help? THanks

Below are the first few lines of my files.

CN-FILE

interval ID chrom start end tumorCount normalCount

1 1 1 1758057 281 313 2 1 1758057 4151407 333 660 3 1 4151407 4372509 352 627 4 1 4372509 4582450 221 191 5 1 4582450 4793866 420 577 6 1 4793866 5009810 354 600 7 1 5009810 5227840 290 386 8 1 5227840 5438636 261 408 9 1 5438636 5646411 300 160 10 1 5646411 6528721 254 461 11 1 6528721 6759914 344 601 12 1 6759914 6983038 360 523 13 1 6983038 7194330 309 633 14 1 7194330 7407624 246 68 15 1 7407624 7622684 208 143 16 1 7622684 7841261 385 786 17 1 7841261 8072429 325 841 18 1 8072429 8306737 287 330

NORMAL-FILE

Chrm Pos Ref_Allele Mut_Allele

17 19713740 0 37 16 28603393 0 31 15 74883710 16 3 9 95784648 281 1 1 213009469 0 26 3 49724639 40 4 6 70961833 37 33 5 1081767 105 85 10 106152111 15 8 14 102695693 20 16 1 151204762 47 4 16 19547747 0 33

TUMOR-FILE

Chrm Pos Ref_Allele Mut_Allele

17 19713740 0 56 16 28603393 1 53 15 74883710 22 0 9 95784648 156 59 1 213009469 0 37 3 49724639 58 10 6 70961833 101 44 5 1081767 92 59 10 106152111 28 22 14 102695693 11 10 1 151204762 59 0 16 19547747 1 50

Below is my command /bin/RunTHetA CN-FILE.txt --TUMOR_FILE TUMOR-FILE.txt --NORMAL_FILE NORMAL-FILE.tsv

Below is my error message

Arguments are: Query File: CN-FILE.txt k: 3 tau: 2 Output Directory: ./ Output Prefix: CN-FILE.txt Num Processes: 1 Graph extension: .pdf

Valid sample for THetA analysis: Ratio Deviation: 0.1 Min Fraction of Genome Aberrated: 0.05 Program WILL cluster intervals.

Reading in query file... Frac with potential copy numbers: 0.839205341782 Reading SNP file at TUMOR-FILE.txt Reading SNP file at NORMAL-FILE.tsv Reading interval file at CN-FILE.txt Calculating BAFs Determining heterozygosity. Calculating BAFs. Traceback (most recent call last): File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 505, in main() File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 283, in main resultsfile2, boundsfile2 = run_fixed_N(2, args, intervals) File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 317, in run_fixed_N intervals, missingData, corrRatio, meanBAFs = get_clustering_args(tumorfile, normalfile, filename, num_processes, m, tumorCounts, normCounts) File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 263, in get_clustering_args intervalsByChrm[chrm].append(interval) IndexError: list index out of range

— Reply to this email directly or view it on GitHub https://github.com/raphael-group/THetA/issues/8.

daidaobee commented 8 years ago

I thought that was the case. But I did a "cat -T " and they are indeed tab separated. It probably looks space separated because of my copy-and-paste.

egeulgen commented 6 years ago

Hello, I'm currently facing the same issue. My SNP files are also tab-seperated. I don't get any errors when I don't specify SNP files. If needed, I can provide my inputs as well. Any help would be greatly appreciated. Best, -E

egeulgen commented 6 years ago

Would you happen to call chrY as 23 as well? I think that was the issue. I changed the necessary line python script as below and now it works. intervalsByChrm[chrm].append(interval) intervalsByChrm[chrm - 1].append(interval)

reykajayasinghe commented 5 years ago

Unfortunately the above fix didn't work for me. I just ended up removing all chrx and chry from my input snp files.