findDMR - Githubissues

jsh58 / DMRfinder

Identifying differentially methylated regions from MethylC-seq (bisulfite-sequencing) data

MIT License

26 stars 8 forks source link

findDMR #7

Closed Lux91 closed 5 years ago

Lux91 commented 5 years ago

Hi, I have a combined file that I manually created to target specific regions. When I try to use the R script findDMR it stays on the "loading sample" part for a very long time and then it saturates the RAM before getting to the "comparing groups" part. I'm running it on a Ubuntu server with 64Gb of ram and 40Gb of swap. The regions are roughly 72000 going from a length of 400 bases to 4000. What could be the issue? Thank you very much

jsh58 commented 5 years ago

Without examining the input file, it is difficult to diagnose the issue. It could be a formatting issue, or possibly an issue with the DSS function makeBSseqData(), which loads the dataframes into memory. I suggest you try running findDMRs.r on a subset of your data to see how that works.

Lux91 commented 5 years ago

Hi, I really don't think it's a formatting issue since I made sure to use only tabs to separate every column and "\n" to separate every line. Starting from this: 20 region start This is what happens if I try with only 20 regions: 20_regions_results If I try with 500 regions it gives me the same error I mentioned before... Thank you...

Lux91 commented 5 years ago

PS I'm soory for the way it looks but I'm using Ubuntu on windows to access the server right now...

jsh58 commented 5 years ago

I would try it with fewer columns. Start with 2 H samples and 2 R samples and see how that goes.

Lux91 commented 5 years ago

Hi, I tried with 2 H and 2 R, this is the outcome.

I used all 72200 regions dispersed along the genome. Regarding the "multiple position" warning, I don't know why it appears since I made sure to have unique regions.

Lux91 commented 5 years ago

Ok, I found the reason for the warning, I have some regions that share the start position but not the end, I hope it' not a problem for the findDMRs script. schermata del 2018-09-27 10-13-28

Lux91 commented 5 years ago

Hello again, I modified the finDMRs script and added some printed messagges to track at what point it gets stuck. It happens during the "makeBSseqData".

jsh58 commented 5 years ago

On the overlapping intervals: findDMRs.r does not have a problem with that -- it does not check, because such a result would not be possible in the output from combine_CpG_sites.py. But it appears that DSS does notice the problem and collapses the regions. I don't know how that works, and advise against having overlapping intervals in the first place.
On the memory issue: makeBSseqData() is part of DSS, and editing DSS is outside the purview of DMRfinder. Sorry that I cannot help, but you can certainly raise the issue with the authors of DSS.

Lux91 commented 5 years ago

Don't worry, I understand, the only unclear thing to me is that when I use regions clustered with your tool, it works... Of course the regions are smaller but the sample dataset size is the same... maybe adding up the overlapping regions with an high number of samples causes some troubles to the DSS library... Thank you for your help :)