molgenis / NIPTeR

R Package for Non Invasive Prenatal Testing (NIPT) analysis
GNU Lesser General Public License v3.0
40 stars 16 forks source link

Cannot allocate memory #10

Open GreeshmaThulasi opened 6 years ago

GreeshmaThulasi commented 6 years ago

Hi, I was trying to do the binning for one of my bam files of size 6GB, I got the following error,

> sample_of_interest <- bin_bam_sample(bam_filepath = "Z:/input/input/GL18_0320.bam")
Loading Bam
Error in value[[3L]](cond) : cannot allocate vector of size 213.9 Mb
  file: Z:/input/input/GL18_0320.bam
  index: NA

even-though I executed from a server. Please help me to rectify the problem.

ljohansson commented 6 years ago

Dear Greeshma,

I am afraid that your sample file size is too large to be processed by NIPTeR on the machine you are using. I am afraid that the only two options are to either downsample your files to a processable size (should work), or to use a more powerful machine.

Regards, Lennart

GreeshmaThulasi commented 6 years ago

Hi, I can't downsample the files. Does the problem indicates less RAM space ? My system having a RAM space of 8GB. Should I increase the RAM space?

ljohansson commented 6 years ago

Dear Greeshma,

Yes, that could help, although your bam files are much larger than the ones we used during design of NIPTeR (< 1 Gb). This could mean that even with more RAM it will run out of memory.

I always like using SAMtools view -s for downsampling purposes.

Cheers, Lennart

GreeshmaThulasi commented 6 years ago

Ok. How many samples did you included in the control_group ? I tried with 4 files, each around 3 GB and again getting the following error. Loading Bam [bam_sort_core] merging from 22 files... BAM loaded Binning Binning done Loading Bam [bam_sort_core] merging from 20 files... BAM loaded Binning Binning done Loading Bam [bam_sort_core] merging from 17 files... BAM loaded Binning Binning done Loading Bam [bam_sort_core] merging from 15 files... BAM loaded Binning Binning done Loading Bam [bam_sort_core] merging from 33 files... Error in value[[3L]](cond) : cannot allocate vector of size 256.0 Mb file: C:\Users\admin\AppData\Local\Temp\RtmpeK4KRe\file17ec3507d0a.bam index: NA Whether the downsampling decreases quality of the file?

ljohansson commented 6 years ago

Dear Greeshma,

Downsampling removes part of the reads, so from that perspective you could say the quality of the file decreases. However a 1 Gb file should have more than enough reads for a reliable prediction, given enough control samples. For all Z-score methods a normal distribution is needed. Generally a threshold of 30 samples is taken before a normal distribution can be obtained. However, the more control samples the better. I would recommend to use at least 100 control samples.

ljohansson commented 6 years ago

Dear Greeshma,

Another solution may be to change the memory limit settings.

More information can be found here: http://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html

GreeshmaThulasi commented 6 years ago

Hi @ljohansson I executed with a set of reference data. I searched z-scores for all chromosomes based on chi-squared method. What's the threshold for aneuploidy? Is it 3?

ljohansson commented 6 years ago

Dear Greeshma, Given a normal distribution of the fraction of the chromosome of interest, 99.87% of samples are expected to have a Z-score below 3. Therefore, indeed often a Z-score of 3 is taken as a threshold for trisomy calling. However, the true trisomy risk is also determined by other factors, such as the CV of the control group (high CV is lower Z-score is low sensitivity), and the a priori risk of a woman to carry a child with a trisomy (high-risk group or low-risk group) as well as the percentage of foetal DNA present. To calculate the personalized post-test risk you could for instance use NIPTRIC: http://www.niptric.eu/

Cheers, Lennart