Closed hhp94 closed 1 year ago
I don't think the chunking is the issue. Will investigate further before reporting back.
EDIT: the problem is I was transferring the idats and files to and from my computer and my university's HPC. The paths to the idats of the QC objects are stored in the sample$basename
and this needs to be updated before normalization. My apologies for the false error.
Thanks for the future_lapply suggestion. I'll have a look. I typically don't run analyses in Windows, so really useful to get feedback from those who do.
Dear Dr. Suderman,
I found a bug while running
meffil.normalize.samples()
using multiple cores.So I dug a bit deeper and I think I might have found the problem.
I think the problem is because inUpdated with EDIT in the below comment../R/mclapply.r
the functionmclapply.safe()
is chunking the work into smaller chunks based on the argument "max.bytes". However, this means that for a sample, one_Grn.idat
might end up in a chunk while the_Red.idat
might end up in another chunk and this will raise an error in./R/rg.r
read.rg()
function which callsread.idat()
.I haven't dug deep enough into the codes to understand fully how the chunking works. So if you were to implement a fix, what would you suggest is the strategy? I can try to implement it in my fork and test it out.
Because of this issue, and another issue - many academics and students use Windows machines, may I suggest that this is a good opportunity to also replace the use of
parallel::mclapply()
with the use of the {future} packagefuture.apply::future_lapply
? This would allow academics to use the PSOCK cluster scheme to runmeffil.qc()
andmeffil.normalize.samples()
on their Windows machines. I tried a quick solution in my fork by replacing just 2mclapply()
calls in./R/mclapply.r
withfuture.apply::future_lapply
and everything worked on my Windows machine.Thank you so much for your work and looking forward to hearing back