Closed Lux91 closed 5 years ago
Without examining the input file, it is difficult to diagnose the issue. It could be a formatting issue, or possibly an issue with the DSS
function makeBSseqData()
, which loads the dataframes into memory. I suggest you try running findDMRs.r
on a subset of your data to see how that works.
Hi, I really don't think it's a formatting issue since I made sure to use only tabs to separate every column and "\n" to separate every line. Starting from this: This is what happens if I try with only 20 regions: If I try with 500 regions it gives me the same error I mentioned before... Thank you...
PS I'm soory for the way it looks but I'm using Ubuntu on windows to access the server right now...
I would try it with fewer columns. Start with 2 H samples and 2 R samples and see how that goes.
Hi, I tried with 2 H and 2 R, this is the outcome.
I used all 72200 regions dispersed along the genome. Regarding the "multiple position" warning, I don't know why it appears since I made sure to have unique regions.
Ok, I found the reason for the warning, I have some regions that share the start position but not the end, I hope it' not a problem for the findDMRs script.
Hello again, I modified the finDMRs script and added some printed messagges to track at what point it gets stuck. It happens during the "makeBSseqData".
On the overlapping intervals: findDMRs.r
does not have a problem with that -- it does not check, because such a result would not be possible in the output from combine_CpG_sites.py
. But it appears that DSS
does notice the problem and collapses the regions. I don't know how that works, and advise against having overlapping intervals in the first place.
On the memory issue: makeBSseqData()
is part of DSS
, and editing DSS
is outside the purview of DMRfinder
. Sorry that I cannot help, but you can certainly raise the issue with the authors of DSS
.
Don't worry, I understand, the only unclear thing to me is that when I use regions clustered with your tool, it works... Of course the regions are smaller but the sample dataset size is the same... maybe adding up the overlapping regions with an high number of samples causes some troubles to the DSS library... Thank you for your help :)
Hi, I have a combined file that I manually created to target specific regions. When I try to use the R script findDMR it stays on the "loading sample" part for a very long time and then it saturates the RAM before getting to the "comparing groups" part. I'm running it on a Ubuntu server with 64Gb of ram and 40Gb of swap. The regions are roughly 72000 going from a length of 400 bases to 4000. What could be the issue? Thank you very much