Closed HenrikBengtsson closed 1 year ago
No parallelisation for Windows users is intentional. I will leave it like this and keep the issue open
https://github.com/quantixed/TrackMateR/issues/9#issuecomment-1504858461
[...] on MS Windows, I assume.
Yes,
foreach(...) %do% {...}
works butforeach(...) %dopar% {...}
doesn't. Mac is working fine. Windows gives:Error in unserialize(socklist[[n]]) : error reading from connection Error in serialize(data, node$con) : error writing to connection
I tried
%dofuture%
it also exits with an error which is at least more verbose (thank you @HenrikBengtsson !)Error in unserialize(node$con) : MultisessionFuture (doFuture2-1) failed to receive results from cluster RichSOCKnode #1 (PID 20064 on localhost ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: No process exists with this PID, i.e. the localhost worker is no longer alive. Detected a non-exportable reference (‘externalptr’ of class ‘XMLInternalElementNode’) in one of the globals (‘subdoc’ of class ‘XMLNodeSet’) used in the future expression. The total size of the 3 globals exported is 10.28 MiB. There are three globals: ‘subdoc’ (10.28 MiB of class ‘list’), ‘attrName’ (1.79 KiB of class ‘character’) and ‘...future.x_ii’ (168 bytes of class ‘list’)
This is because your subdoc
object in:
is an object class XMLNodeSet
that contains XMLInternalElementNode
objects, which cannot be exported to another R process. If trying to use those in the other process, it crashes that other R processes (which is very harsh). You can read about in https://future.futureverse.org/articles/future-4-non-exportable-objects.html#package-xml, which also gives two workarounds.
I don't see how the non-exportable reference could cause the connection(s) to die because it all runs fine on Mac.
It "works fine", because forked parallel processing is used (which is supported on Linux and macOS, but not MS Windows). I put "works fine" in quotes, because I'm not 100% sure it is safe to do so, even if you don't experience crashes. This is something that the maintainer of XML would have to confirm.
From the above vignette, the first workaround would be to do:
subdoc_m <- lapply(subdoc, FUN = xmlSerializeHook)
dtf <- as.data.frame(foreach(name = attrName, .combine = cbind) %dofuture% {
subdoc <- xmlDeserializeHook(subdoc_m)
sapply(subdoc, xmlGetAttr, name)
})
rm(subdoc_m)
That would work regardless of platform and plan()
.
That said, subdoc
is probably a much larger object than attrName
. In your current implementation, you end up exporting the whole subdoc
to all parallel workers, which is unnecessary. I would flip it around and do something like:
subdoc_m <- lapply(subdoc, FUN = xmlSerializeHook)
res <- foreach(el_m = subdoc_m, .combine = cbind) %dofuture% {
el <- xmlDeserializeHook(el_m)
sapply(attrName, FUN = function(name) xmlGetAttr(el, name))
}
rm(subdoc_m)
This would export exactly the subset of subdoc[_m]
that each worker needs - not more, not less.
I like the idea of using {future}
and finally found some time to rewrite the core function to take advantage of this framework. In my testing, on macOS and on Windows, the original version is faster. The difference is significant on a 20 core Mac Studio (15 s vs 3 s), less so on an 8 core Windows box (51 s to 47 s). I'm not certain it's working correctly because I wasn't expecting such a slow down on macOS.
> ptm <- proc.time()
> # perform parallel read
> # test if we are running on windows
> for(i in 1:5) {
+ if (.Platform[["OS.type"]] == "windows") {
+ cat("Collecting spot data...\n")
+ dtf <- as.data.frame(foreach(i = 1:length(attrName), .packages = c("foreach","XML"), .combine = cbind) %do% {
+ sapply(subdoc, xmlGetAttr, attrName[i])
+ })
+ } else {
+ cat(paste0("Collecting spot data. Using ",numCores," cores\n"))
+ dtf <- as.data.frame(foreach(i = 1:length(attrName), .combine = cbind) %dopar% {
+ sapply(subdoc, xmlGetAttr, attrName[i])
+ })
+ }
+ rm(dtf)
+ }
Collecting spot data. Using 20 cores
Collecting spot data. Using 20 cores
Collecting spot data. Using 20 cores
Collecting spot data. Using 20 cores
Collecting spot data. Using 20 cores
> proc.time() - ptm
user system elapsed
27.863 16.940 3.106
plan(multisession, workers = numCores)
> ptm <- proc.time()
> for(i in 1:5) {
+ cat(paste0("Collecting spot data. Using ",numCores," cores\n"))
+ subdoc_m <- lapply(subdoc, FUN = XML::xmlSerializeHook)
+ dtf <- as.data.frame(foreach(el_m = subdoc_m, .combine = rbind) %dofuture% {
+ el <- XML::xmlDeserializeHook(el_m)
+ sapply(attrName, FUN = function(name) xmlGetAttr(el, name))
+ })
+ rm(subdoc_m)
+ rm(dtf)
+ }
Collecting spot data. Using 20 cores
Collecting spot data. Using 20 cores
Collecting spot data. Using 20 cores
Collecting spot data. Using 20 cores
Collecting spot data. Using 20 cores
> proc.time() - ptm
user system elapsed
12.138 0.723 14.701
> ptm <- proc.time()
> # perform parallel read
> # test if we are running on windows
> for(i in 1:5) {
+ if (.Platform[["OS.type"]] == "windows") {
+ cat("Collecting spot data...\n")
+ dtf <- as.data.frame(foreach(i = 1:length(attrName), .packages = c("foreach","XML"), .combine = cbind) %do% {
+ sapply(subdoc, xmlGetAttr, attrName[i])
+ })
+ } else {
+ cat(paste0("Collecting spot data. Using ",numCores," cores\n"))
+ dtf <- as.data.frame(foreach(i = 1:length(attrName), .combine = cbind) %dopar% {
+ sapply(subdoc, xmlGetAttr, attrName[i])
+ })
+ }
+ rm(dtf)
+ }
Collecting spot data...
Collecting spot data...
Collecting spot data...
Collecting spot data...
Collecting spot data...
> proc.time() - ptm
user system elapsed
45.72 1.47 47.20
> plan(multisession)
> ptm <- proc.time()
> for(i in 1:5) {
+ cat(paste0("Collecting spot data. Using ",numCores," cores\n"))
+ subdoc_m <- lapply(subdoc, FUN = XML::xmlSerializeHook)
+ dtf <- as.data.frame(foreach(el_m = subdoc_m, .combine = rbind) %dofuture% {
+ el <- XML::xmlDeserializeHook(el_m)
+ sapply(attrName, FUN = function(name) xmlGetAttr(el, name))
+ })
+ rm(subdoc_m)
+ rm(dtf)
+ }
Collecting spot data. Using 8 cores
Collecting spot data. Using 8 cores
Collecting spot data. Using 8 cores
Collecting spot data. Using 8 cores
Collecting spot data. Using 8 cores
> proc.time() - ptm
user system elapsed
28.57 1.06 51.22
Hello. In your original macOS code, you used:
registerDoParallel(numCores)
which launches forked parallel workers. To get forked workers when using futures, use:
plan(multicore, workers = numCores)
and compare with:
dtf <- as.data.frame(foreach(i = 1:length(attrName), .combine = cbind) %dofuture% {
sapply(subdoc, xmlGetAttr, attrName[i])
})
That will give you a more fair comparison. (Yes, that wipes any potential problem from using forked processing, as mention in https://github.com/quantixed/TrackMateR/issues/11#issuecomment-1505514406, under the rug).
If you want to compare using non-forked parallelization, then use:
cl <- makeCluster(numCores)
on.exit(stopCluster(cl))
registerDoParallel(cl)
in your "original" macOS code and compare it to your new macOS code (as you have above). Note that this solution requires xmlSerializeHook()/xmlDeserializeHook()
.
Regarding performance findings for MS Windows (and most likely also macOS): It could very well be that the performance overhead, from using xmlSerializeHook()
, sending objects over, and then xmlDeserializeHook()
on the other end, is greater than the gain achieved from running in parallel.
Hi Henrik, thank you for the explanation.
I have retested using your suggested modifications. I now get a fairer comparison. Also, your suspicion about xmlSerializeHook()
and xmlDeserializeHook()
is correct. This is definitely slowing things down.
I think my best option is to leave things as they were and close this issue. Thanks for all your help and input.
In:
https://github.com/quantixed/TrackMateR/blob/d6b981f00c4024d61ffae183b47aaf5a78fb8d2d/R/readTrackMateXML.r#L61
you're using
%do%
. That will run sequentially, regardless of yourregisterDoParallel()
settings. You want to use%dopar%
.