Open daidaobee opened 8 years ago
Hi Marco,
I believe that the error is being caused because the input file should be tab separated. From what you included here, it looks like your input is space separated. Let me know if the problem persists.
Best, Gryte Satas
On Thu, Feb 25, 2016 at 1:01 PM, Marco Leung notifications@github.com wrote:
I have created the copy number input file, normal file as well as the tumor file. I followed the format according to the Manual instruction.
When I only run RunTHetA with just the copy number file, it seems to work out fine. However when I add the option for the tumor file and normal file, it gave me the error.
Please help? THanks
Below are the first few lines of my files.
CN-FILE
interval ID chrom start end tumorCount normalCount
1 1 1 1758057 281 313 2 1 1758057 4151407 333 660 3 1 4151407 4372509 352 627 4 1 4372509 4582450 221 191 5 1 4582450 4793866 420 577 6 1 4793866 5009810 354 600 7 1 5009810 5227840 290 386 8 1 5227840 5438636 261 408 9 1 5438636 5646411 300 160 10 1 5646411 6528721 254 461 11 1 6528721 6759914 344 601 12 1 6759914 6983038 360 523 13 1 6983038 7194330 309 633 14 1 7194330 7407624 246 68 15 1 7407624 7622684 208 143 16 1 7622684 7841261 385 786 17 1 7841261 8072429 325 841 18 1 8072429 8306737 287 330
NORMAL-FILE
Chrm Pos Ref_Allele Mut_Allele
17 19713740 0 37 16 28603393 0 31 15 74883710 16 3 9 95784648 281 1 1 213009469 0 26 3 49724639 40 4 6 70961833 37 33 5 1081767 105 85 10 106152111 15 8 14 102695693 20 16 1 151204762 47 4 16 19547747 0 33
TUMOR-FILE
Chrm Pos Ref_Allele Mut_Allele
17 19713740 0 56 16 28603393 1 53 15 74883710 22 0 9 95784648 156 59 1 213009469 0 37 3 49724639 58 10 6 70961833 101 44 5 1081767 92 59 10 106152111 28 22 14 102695693 11 10 1 151204762 59 0 16 19547747 1 50
Below is my command /bin/RunTHetA CN-FILE.txt --TUMOR_FILE TUMOR-FILE.txt --NORMAL_FILE NORMAL-FILE.tsv
Below is my error message
Arguments are: Query File: CN-FILE.txt k: 3 tau: 2 Output Directory: ./ Output Prefix: CN-FILE.txt Num Processes: 1 Graph extension: .pdf
Valid sample for THetA analysis: Ratio Deviation: 0.1 Min Fraction of Genome Aberrated: 0.05 Program WILL cluster intervals.
Reading in query file... Frac with potential copy numbers: 0.839205341782 Reading SNP file at TUMOR-FILE.txt Reading SNP file at NORMAL-FILE.tsv Reading interval file at CN-FILE.txt Calculating BAFs Determining heterozygosity. Calculating BAFs. Traceback (most recent call last): File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 505, in main() File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 283, in main resultsfile2, boundsfile2 = run_fixed_N(2, args, intervals) File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 317, in run_fixed_N intervals, missingData, corrRatio, meanBAFs = get_clustering_args(tumorfile, normalfile, filename, num_processes, m, tumorCounts, normCounts) File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 263, in get_clustering_args intervalsByChrm[chrm].append(interval) IndexError: list index out of range
— Reply to this email directly or view it on GitHub https://github.com/raphael-group/THetA/issues/8.
I thought that was the case. But I did a "cat -T " and they are indeed tab separated. It probably looks space separated because of my copy-and-paste.
Hello, I'm currently facing the same issue. My SNP files are also tab-seperated. I don't get any errors when I don't specify SNP files. If needed, I can provide my inputs as well. Any help would be greatly appreciated. Best, -E
Would you happen to call chrY as 23 as well? I think that was the issue. I changed the necessary line python script as below and now it works.
intervalsByChrm[chrm].append(interval)
intervalsByChrm[chrm - 1].append(interval)
Unfortunately the above fix didn't work for me. I just ended up removing all chrx and chry from my input snp files.
I have created the copy number input file, normal file as well as the tumor file. I followed the format according to the Manual instruction.
When I only run RunTHetA with just the copy number file, it seems to work out fine. However when I add the option for the tumor file and normal file, it gave me the error.
Please help? THanks
Below are the first few lines of my files.
CN-FILE
interval ID chrom start end tumorCount normalCount
1 1 1 1758057 281 313 2 1 1758057 4151407 333 660 3 1 4151407 4372509 352 627 4 1 4372509 4582450 221 191 5 1 4582450 4793866 420 577 6 1 4793866 5009810 354 600 7 1 5009810 5227840 290 386 8 1 5227840 5438636 261 408 9 1 5438636 5646411 300 160 10 1 5646411 6528721 254 461 11 1 6528721 6759914 344 601 12 1 6759914 6983038 360 523 13 1 6983038 7194330 309 633 14 1 7194330 7407624 246 68 15 1 7407624 7622684 208 143 16 1 7622684 7841261 385 786 17 1 7841261 8072429 325 841 18 1 8072429 8306737 287 330
NORMAL-FILE
Chrm Pos Ref_Allele Mut_Allele
17 19713740 0 37 16 28603393 0 31 15 74883710 16 3 9 95784648 281 1 1 213009469 0 26 3 49724639 40 4 6 70961833 37 33 5 1081767 105 85 10 106152111 15 8 14 102695693 20 16 1 151204762 47 4 16 19547747 0 33
TUMOR-FILE
Chrm Pos Ref_Allele Mut_Allele
17 19713740 0 56 16 28603393 1 53 15 74883710 22 0 9 95784648 156 59 1 213009469 0 37 3 49724639 58 10 6 70961833 101 44 5 1081767 92 59 10 106152111 28 22 14 102695693 11 10 1 151204762 59 0 16 19547747 1 50
Below is my command /bin/RunTHetA CN-FILE.txt --TUMOR_FILE TUMOR-FILE.txt --NORMAL_FILE NORMAL-FILE.tsv
Below is my error message
Arguments are: Query File: CN-FILE.txt k: 3 tau: 2 Output Directory: ./ Output Prefix: CN-FILE.txt Num Processes: 1 Graph extension: .pdf
Valid sample for THetA analysis: Ratio Deviation: 0.1 Min Fraction of Genome Aberrated: 0.05
Program WILL cluster intervals.
Reading in query file... Frac with potential copy numbers: 0.839205341782 Reading SNP file at TUMOR-FILE.txt Reading SNP file at NORMAL-FILE.tsv Reading interval file at CN-FILE.txt Calculating BAFs Determining heterozygosity. Calculating BAFs. Traceback (most recent call last): File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 505, in
main()
File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 283, in main
resultsfile2, boundsfile2 = run_fixed_N(2, args, intervals)
File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 317, in run_fixed_N
intervals, missingData, corrRatio, meanBAFs = get_clustering_args(tumorfile, normalfile, filename, num_processes, m, tumorCounts, normCounts)
File "/volumes/neo/code/3rd_party/THetA-master/bin/../python/RunTHetA.py", line 263, in get_clustering_args
intervalsByChrm[chrm].append(interval)
IndexError: list index out of range