tf2 / CNest

Copy Number Methods for Detection and Genome Wide Association Tests
22 stars 5 forks source link

RAM usage consultant #9

Open whiteorchid opened 2 years ago

whiteorchid commented 2 years ago

Dear author,

May I apply for your guidance on the usage of the RAM for CNest, say if can be run on a local machine with about 16G RAM? Thanks a lot!

Best,

tf2 commented 2 years ago

Sure, can I ask what is your application ie how many sequence datasets / samples do you have? And what type of sequence data is it (eg Illumina WGS or WES)?

CNest is designed to run across very large sequence datasets (>1000 samples) but if you want to test a smaller number of samples locally it will be possible.

Note: there are also implementations that can run in a number of cloud environments e.g. Terra, if that is useful to you.

But first let me know what data size and type you have, very happy to help you get it running.

Sent from my iPhone

On 15 Feb 2022, at 23:03, whiteorchid @.***> wrote:

 Dear author,

May I apply for your guidance on the usage of the RAM for CNest, say if can be run on a local machine with about 16G RAM? Thanks a lot!

Best,

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you are subscribed to this thread.

whiteorchid commented 2 years ago

Thanks a lot!

I have several samples(<10) of WES data. The data is not tumor samples, so no normal vs control.

Is this still suitable to run CNest for such a little sample dataset? Thanks!

Best regards,

tf2 commented 2 years ago

Ok cool, I see.

So although I would not really recommend CNest for this size of dataset what it could still give you is good estimates of relative copy number across all your samples and reliable CNV calls (noting that with small datasets the copy number estimates will be somewhat limited).

The main methods within CNest search for optimised reference data sets and these methods generally need large sample numbers to give high accuracy.

In this case, because it’s quite low numbers I would suggest that you set the “batch_size” parameter to half of your total sample size, so that would be ~4 correct?

One issue which may come up is that CNest does a gender classification and then looks at samples in a gender missmatched and matched way to estimate some dose response characteristics.

A problem will be that if there are not at least the number that you set for the “batch_size” parameter for each gender the method will fail.

We can hack that around a bit if you like, if you still want to try I’m happy to provide some details. Just let me know.

Nb. I’m currently on holiday so may be a bit slow responding- next week I would be more than happy to help you get it running on your dataset.

Sent from my iPhone

On 16 Feb 2022, at 12:05, whiteorchid @.***> wrote:

 Thanks a lot!

I have several samples(<10) of WES data. The data is not tumor samples, so no normal vs control.

Is this still suitable to run CNest for such a little sample dataset? Thanks!

Best regards,

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.