rgcgithub / clamms

CLAMMS is a scalable tool for detecting common and rare copy number variants from whole-exome sequencing data.
Other
29 stars 10 forks source link

Memory errors while running 'call_cnv' #5

Open vijaymp38 opened 7 years ago

vijaymp38 commented 7 years ago

Hello, We have been encountering issues with the last step in the CLAMMS pipeline to call CNVs on autosomes (call_cnv.c). We tried with a subset of actual files and got the same error as well. For convenience, I have also listed the number of rows in the coverage file as well as the models file below. It would be great if you could help us resolve this issue or direct us in the right direction to investigate and resolve it ourselves. We appreciate your time!

$ /data/software/clamms/call_cnv NA18973.norm.cov.bed models_auto.bed Error in `/data/software/clamms/call_cnv': free(): invalid next size (normal): 0x0000000003698330 Aborted

$ wc -l NA18973.norm.cov.bed 192031 NA18973.norm.cov.bed $ wc -l models_auto.bed 192031 models_auto.bed

Sample Coverage File $ head NA18973.norm.cov.bed 1 14642 14882 0.0713833 1 14943 15063 0.0307186 1 15751 15990 0.0469898 1 16599 16719 0 1 16834 17074 0.00682635 1 17211 17331 0.0189912 1 30275 30431 0 1 69069 70029 0.206374 1 129133 129253 0 1 228233 228354 0

Sample Models File $ head models_auto.bed 1 14642 14882 -1 0.650 0.160 1 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 14943 15063 -1 0.590 0.125 1 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 15751 15990 -1 0.649 0.395 1 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 16599 16719 -1 0.665 0.125 1 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 16834 17074 -1 0.592 0.128 1 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 17211 17331 -1 0.585 0.125 1 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 30275 30431 -1 0.420 0.167 1 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 69069 70029 -1 0.423 0.343 1 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 129133 129253 -1 0.400 0.081 1 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 228233 228354 -1 0.310 0.315 1 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

rgcgithub commented 7 years ago

Would it be possible to send me the example normalized coverage and model files (gzipped) to see if I can reproduce? This looks like a memory error we have not encountered before.

vijaymp38 commented 7 years ago

Thanks for such a quick response! I have attached both the files here. CLAMMS_Input_Files.zip

rgcgithub commented 7 years ago

Sorry - didn't notice your response until now. I looked at your files and there seems to have been an issue creating your models file because all exons are flagged as outliers (-1 in fourth column). For non-outlier exons, this value should be equal to the maximum copy number considered, which will be 3 in most cases, or 6 in known multi-copy dup regions.

An exon can be flagged during model construction for multiple reasons, but based on your coverage file, it looks to be due to low coverage at all exons. Note that your median normalized coverage over all exons on the NA### sample is 0.04 and max 0.2 (median of fourth column in .norm.cov.bed file). That median value should be close to 1.

Exon-level coverage is median normalized to 1 w.r.t. GC content, so your total median coverage should be much higher. I'm guessing this is why your models are not being generated properly, so step back and make sure you are generating normalized coverage BED files first.

Evan



Just noticed the coverage file you sent was a sample with 10 lines (so obviously the median I noted is inaccurate), but check it on the full file nonetheless. The mean/sd values in the model file suggest coverage is very low.
vijaymp38 commented 7 years ago

Evan, Thanks for your time and this explanation. We will investigate this further and let you know what we find.