kdkorthauer / dmrseq

R package for Inference of differentially methylated regions (DMRs) from bisulfite sequencing
MIT License
54 stars 14 forks source link

Error when running dmrseq function and question about generating PCA #38

Open amjass12 opened 4 years ago

amjass12 commented 4 years ago

Hello!

First of all, thank you for this great package!

I am running in to a problem when running the dmrseq function on ALL of my data. I have a filtered object that contains 21mill loci as follows:

An object of type 'BSseq' with
  21266393 methylation loci
  18 samples
has not been smoothed
All assays are in-memory

Now when I call the dmrseq function as follows:

testCovariate <- "age"

regions <- dmrseq(bs=bs.filt,
                  cutoff = 0.05,
                  testCovariate=testCovariate, maxPerms = 1)

The function runs without issue until it gets to chromosome 2 (however i have also replicated this when running dmrseq chrperchunk=5).. dmrseq stops running and gives me the following error:

Error in result[[njob]] <- value : attempt to select less than one element in OneIndex

I am really unsure as to why this is occurring and any advice (or fix) would be greatly appreciated!

When i filter the bs.filt object to the first 50-100,000 loci this error does not occur. This is good for testing the function and general quality of the data however I really need to run this on the entire dataset for differentially methylated regions etc....

Secondly, I am making a PCA plot with prcomp. I saw an example of hierarchical clustering and similarity matrix being carried out on raw data as follows (this comes from the bsseq vignette:

cormat <- round(cor(as.matrix(getMeth(fil, type="raw")) I was wondering if you could comment on two things:

Why are raw methylation values used? should they not be normalised in any way?

is it advised to make a pca from the raw counts of methlyation percentages as in assay(bs.filt) or should a pca be generated using methylation estimates? thank you for advising on this as it isn't clear cut to me what normalisation output there is and what should be sued for quality control plots!

many thanks in advance!

kdkorthauer commented 3 years ago

Hi @amjass12,

I apologize that I somehow overlooked this open issue.

I suspect the error might arise if you have not filtered out loci that don't have coverage in at least one sample per condition.

For your question about PCA plots, this is unrelated to dmrseq. But I don't see a problem with constructing a similarity matrix on raw methylation proportions. Another option could be to use M-values. I'm not sure what you mean by 'methylation estimates'.

malonzm1 commented 1 year ago

Hi!

I get a similar error Beginning permutation 2 ...Chromosome chr1: Error in h(simpleError(msg, call)) : error in evaluating the argument 'args' in selecting a method for function 'do.call': attempt to select less than one element in OneIndex Calls: dmrseq ... bplapply -> bploop -> bploop.lapply -> .handleSimpleError -> h In addition: Warning message: In parallel::mccollect(wait = FALSE, timeout = 1) : 1 parallel job did not deliver a result

As with the earlier comment, it doesn't happen when I use bs[120001:125000,].

I've already filtered out loci that don't have coverage in at least one sample per condition.

Thanks and good day.

kdkorthauer commented 1 year ago

Hi @malonzm1,

Can you provide me with some more details so I can help track down your issue? It would be most helpful if you can provide a (as small as possible) subset of your data that produces the error, along with the code you are using that throws the error.

Thanks!