noemiandor / expands

Expanding Ploidy and Allele Frequency on Nested Subpopulations (expands) characterizes coexisting subpopulations in a single tumor sample using copy number and allele frequencies derived from exome- or whole genome sequencing input data.
Other
3 stars 2 forks source link

Error in density.default #2

Closed kvaldez closed 5 years ago

kvaldez commented 5 years ago

I am getting an error in some of my samples when attempting to run the computeCellFrequencyDistributions step:

Error in density.default(mat[, \"f\"], bw = \"SJ\", adjust = 0.25, kernel = c(\"gaussian\"), : \n 'weights' must all be finite\n

Do you have any advice on how to avoid this? It's wiping out many of my SNPs, leaving very little left to analyze. Is there perhaps a parameter that I can set?

Any help is much appreciated!

noemiandor commented 5 years ago

What's the average allele frequency of your SNPs? It might be that small values (<0.1) are overrepresented. Try changing the value of min_CF to 0.05 or 0.01 by calling: runExPANdS(..., min_CF=0.05)

This will increase run time though.

kvaldez commented 5 years ago

Hi Noemi,

Thank you for the quick reply and advice. I attempted to change the min_CF as suggested with no luck.

My allele frequencies for all samples range from 0.16 to 0.35, and aren't comparatively low for the problematic samples.

I inspected the copy numbers, the mean CN for the problematic samples is around 2 and doesn't vary much, where it ranges from 3-6 for the rest of the samples with a varied range for each segment. I'm wondering if the copy number calls are causing the issue.

I am using sequenza, which gives copy numbers in whole numbers. May I ask which circular binary segmentation algorithm you used to get the recommended rational positive estimates?

Thanks again, Kristin

noemiandor commented 5 years ago

Yes, you are right -- the integer copy numbers are the likely cause of the problem. I used CNVkit when calling CNVs from exome-seq data. One of the parameters can be set so as to return higher precision in output.

kvaldez commented 5 years ago

I'll give that a try, thanks again.

Kristin

kvaldez commented 5 years ago

Hi Noemi,

I've been playing around with cnvkit and haven't found an option to stop it from rounding CN to whole integers. I do see that I can export the log2, and found that if I take (2^x)*2 I can get estimated numbers that are very close to the rounded integers. Is this how you determined the estimated CN? Or am I potentially missing a piece?

As always, many thanks! Kristin

noemiandor commented 5 years ago

Hi Kristin! That is correct. You can do even better by using ABSOLUTE, or some another approach that calls ploidy, and then make sure the weighted average copy number of your segments add up to that number.

lydiayliu commented 4 years ago

Hi Noemi,

I also run into this problem extensively. I was wondering if this is because I'm using CNA callers that give clonal / subclonal integer copy numbers and their cellular prevalence (TITAN, Battenberg and FACETS). I was wondering if you could advise on how one could incorporate subclonal copy number calls properly in ExPANdS? Currently I'm using the approach where I multiple the copy number by the clonality to create an non-integer input. (This doesn't work for regions where the copy number is clonal CN=2, which probably explains the extensive error...)

Thank you very much, Lydia