lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
125 stars 32 forks source link

Small panel triggers an error in the `.getAverageWeightPV` function #363

Closed tinyheero closed 2 months ago

tinyheero commented 3 months ago

Hi,

I've got a small panel (< 800 targets) that I am trying to run through PureCN just to see if it would work (I recognize that it isn't the best dataset to use).

I've hit the following error:

cannot take a sample larger than the population when 'replace = FALSE'

I've traced the issue down to the .getAverageWeightPV() function that is called during CBS segmentation. In particular, the line:

permutations <- lapply(num_marks, function(l)
        sapply(sample(length(weights), perm), .do_permutation, l))

is causing this issue as perm is hardcoded to 2000. As such, it can't sample 2000 without replacement since we have < 800 targets.

I am not sure what these weights are being used for. I am just curious as to whether lowering this parameter is a good idea or not to get past this issue?

lima1 commented 3 months ago

Thanks @tinyheero . I'll have a look. My toy examples are pretty small and they run through, so it should work.

lima1 commented 2 months ago

Should be fixed now in the issue_363 branch I will merge when it runs through. Should make it into the next stable version next week.

This p-value is used for flagging segments that consist of baits that have a high variance in the pool of normal samples (requires the normaldb).

tinyheero commented 2 months ago

Thanks @lima1!