Open SNRNS opened 8 years ago
Hi Alejandro,
Can you please share how you solved the problem?
I am getting the following.
Arguments are: Query File: Test2Segment k: 3 tau: 2 Output Directory: ./ Output Prefix: Test2Segment Num Processes: 6 Graph extension: .pdf
Valid sample for THetA analysis: Ratio Deviation: 0.1 Min Fraction of Genome Aberrated: 0.05
Reading in query file...
Traceback (most recent call last):
File "/usr/local/src/THetA/python/RunTHetA.py", line 504, in
Thanks !
Hi @ChiragNepal,
I couldn't solve the problem. I used other tools in the end. For relative copy number alterations I used CopywriteR, for tumour purity and absolute copy numbers I used ABSOLUTE, and finally to get the cellular prevalence of the mutations I used PyClone.
I hope that helps.
Best, Alejandro
I'm also having this problem. Removing multiprocessing and changing p.map
to map
in ClusteringBAF.py results in this error instead:
First round of clustering...
Traceback (most recent call last):
File "/xchip/scarter/dmccabe/THetA/bin/../python/RunTHetA.py", line 504, in <module>
main()
File "/xchip/scarter/dmccabe/THetA/bin/../python/RunTHetA.py", line 282, in main
resultsfile2, boundsfile2 = run_fixed_N(2, args, intervals)
File "/xchip/scarter/dmccabe/THetA/bin/../python/RunTHetA.py", line 319, in run_fixed_N
lengths, tumorCounts, normalCounts, m, upper_bounds, lower_bounds, clusterAssignments, numClusters, clusterMeans, normalInd = clustering_BAF(n, intervals=intervals, missingData=missingData, prefix=prefix, outdir=directory, numProcesses=num_processes)
File "/xchip/scarter/dmccabe/THetA/python/ClusteringBAF.py", line 90, in clustering_BAF
metaData = generate_meta_data(intervals, byChrm, numProcesses, sampleName, generateData, outdir)
File "/xchip/scarter/dmccabe/THetA/python/ClusteringBAF.py", line 146, in generate_meta_data
results = map(cluster_wrapper, zip(intervals, linearizedSampleName, linearizedChrm, linearizedGenerateData))
File "/xchip/scarter/dmccabe/THetA/python/ClusteringBAF.py", line 203, in cluster_wrapper
mus, sigmas, clusterAssignments, numPoints, numClusters = cluster(binnedChrm, sampleName, chrm=chrm)
File "/xchip/scarter/dmccabe/THetA/python/ClusteringBAF.py", line 259, in cluster
Data = format_data(data, sampleName, chrm)
File "/xchip/scarter/dmccabe/THetA/python/ClusteringBAF.py", line 302, in format_data
Data = bnpy.data.XData(X=npArray)
File "/xchip/scarter/dmccabe/bnpy-dev/bnpy/data/XData.py", line 97, in __init__
self._check_dims()
File "/xchip/scarter/dmccabe/bnpy-dev/bnpy/data/XData.py", line 121, in _check_dims
assert self.X.flags.owndata
AssertionError
I don't have any problems running the example, for some reason. Just my own data.
It has been suggested in another forum that at least 1 SNP has to be present in every single interval_count. I assume that the example works perfectly because the number of interval_count is very limited compared to the number of SNP. Can someone confirm this hypothesis ? Is THetA still supported by the developper ?
If it is confirmed, as a consequence, if you want to be more resolutive in terms of interval_count (eg : in case of WGS presenting chromothripsis/chromoplexy events) it will be almost impossible to have 1 SNP per interval.
Dear @BaptisteAmeline,
As you can read in the manual (https://github.com/raphael-group/THetA/blob/master/doc/MANUAL.txt) the usage of SNPs is part of a recommended but optional step. As far as I am aware of, there is no requirement about having a SNP per each interval_count.
However, to clarify your point (since almost all the allele-specific copy-number callers consider only segments having at least one heterozygous SNP), germinal SNPs are considered and their location is with respect to the reference genome which you used to align the reads. Therefore, catastrophic events as chromothripsis/chromoplexy do not affect in any way the position or presence of germinal-heterozygous SNPs in the reference genome. In the human-reference genome you may expect 1 SNP every 1k bases, and as such you should expect to have many SNPs in your intervals considering standard-bin sizes. The allele counts from these SNPs is used to infer the allele-specific copy numbers of each segment in the tumor sample due the effects of any aberration (including chromothripsis/chromoplexy). E.g. heterozygous SNPs should have a proportion of the alleles of 50%, if one of the alleles is lost, the expected proportions is 0%, assuming this occurs in all cells of the sample. You can search for B-allele frequency (BAF) to get to know more about this.
I successfully analysed the example files with RunTheta, however when I tried to analyse my sample it produces an error. I have the normal and primary tumour whole exome sequencing bam files. I created the Theta input file and the snp.withCounts files, one for normal and one for tumour files following the instructions. However when I run in the THetA-master directory:
I get:
The normal_snp.withCounts file looks like this:
The primary_snp.withCounts looks like this:
The normal_primary.input looks like this:
I would appreciate very much any help provided.
Many thanks, Alejandro