quantixed / TrackMateR

Analysis of TrackMate XML outputs in R
https://quantixed.github.io/TrackMateR/
Other
12 stars 1 forks source link

Cannot Open the Connections #9

Closed rosy-li closed 1 year ago

rosy-li commented 1 year ago

Hi, I am having a connection issue in my console. Whenever I run the "compareDatasets" or run a couple of samples in the same session, it always reports "Error in file(con, "w"): cannot open the connection". So I strongly encourage you add the "closeAllConnections" to the compareDatasets() function. Otherwise, it does not run on my side.

I appreciate your help!

quantixed commented 1 year ago

Hi @rosy-li thanks for reporting this issue. Please can you let me know:

rosy-li commented 1 year ago

Hi, I am using windows. Yes, whenever I run closeAllConnections it solves the problem. And the files are in the suggested directories. Thanks for your response!

quantixed commented 1 year ago

Thank you for your answers. I previously added closeAllConnections (somewhere, likely not in the correct place) and it broke something else, so I removed it. I can't reproduce this problem on macOS but I will try to fix when I can access a Windows box (probably next week).

rosy-li commented 1 year ago

Great, thanks so much!! I'll also try to run it on macOS at the meantime

HenrikBengtsson commented 1 year ago

Hey, I stumbled upon this via your https://fosstodon.org/@quantixed/110179936626462534 post.

First of all, you never ever ever++ want to call closeAllConnections() in package code. It is such a destructive function that wreaks havoc. It's as bad as calling quit(), rm(list = ls(globalenv()), envir = globalenv()), file.remove(dir(), recursive = TRUE)), ... That applies to R scripts too; don't use it there, either. Basically, if you ever find yourself having to use closeAllConnections() it's a strong suggestion that there is a bigger issue somewhere in the code that needs to be solved.

In your Fediverse thread, you mention:

Users report that number of available connections becomes exhausted on repeated calls. I have tracked this down to (not reading the files!) but to my setup of parallel processing.

https://github.com/quantixed/TrackMateR/blob/d6b981f00c4024d61ffae183b47aaf5a78fb8d2d/R/readTrackMateXML.r#L56

This line opens 8 connections on Windows (8-core machine), that don’t get closed. 0 on Mac.

That line:

registerDoParallel(numCores)

behaves differently on a machine that supports forked processing (e.g. Linux and macOS) from a machine that doesn't (e.g. MS Windows). If you want to emulate what happens on MS Windows, use:

cl <- parallel::makeCluster(numCores)
registerDoParallel(cl = cl)

Note that it does not automagically call parallel::stopCluster(cl) when done for you. This means that if you call registerDoParallel(numCores) on MS Windows, you'll end up creating more and more parallel workers running in the background, e.g.

> showConnections()
     description class mode text isopen can read can write
> doParallel::registerDoParallel(2)
> showConnections()
  description               class      mode  text     isopen   can read can write
4 "<-DESKTOP-72T1DNG:11958" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
5 "<-DESKTOP-72T1DNG:11958" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
> doParallel::registerDoParallel(2)
> showConnections()
  description               class      mode  text     isopen   can read can write
4 "<-DESKTOP-72T1DNG:11958" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
5 "<-DESKTOP-72T1DNG:11958" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
6 "<-DESKTOP-72T1DNG:11958" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
7 "<-DESKTOP-72T1DNG:11958" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
> doParallel::registerDoParallel(2)
> showConnections()
  description               class      mode  text     isopen   can read can write
4 "<-DESKTOP-72T1DNG:11958" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
5 "<-DESKTOP-72T1DNG:11958" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
6 "<-DESKTOP-72T1DNG:11958" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
7 "<-DESKTOP-72T1DNG:11958" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
8 "<-DESKTOP-72T1DNG:11958" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
9 "<-DESKTOP-72T1DNG:11958" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
> 

As you can see, each parallel worker occupies a connection in R. Eventually, you'll run out of available connections - there are only 125 available in R as it stands today (https://github.com/HenrikBengtsson/Wishlist-for-R/issues/28). When you run out of available connections, R fails with an cannot open the connection error. This may happen when you for instance try to create another parallel worker, when you try to open a file. Exactly when it happens depends on where in the code you are.

My suggestion is to replace that line with something like:

registerDoParallel(numCores)
on.exit(stopImplicitCluster())

or more explicitly:

if (.Platform[["OS.type"]] == "windows") {
  ## PSOCK-based parallel processing
  cl <- parallel::makeCluster(numCores)
  on.exit(parallel::stopCluster(cl))
  registerDoParallel(cl = cl)
} else {
  ## Forked parallel processing
  registerDoParallel(cores = numCores)
}
quantixed commented 1 year ago

Hi @HenrikBengtsson thank you so much for diving in to help! After moaning on Mastodon, I had got as far as figuring out some of what you wrote, but the remainder and your explanation overall has cleared up my understanding.

Right now, this part of the code is not working as intended (as you describe). It is a kludge to:

weirdly it does work and the data gets read but I am not convinced the parallelisation is working properly, because now I have registered the cluster properly on both platforms, I am getting an error with foreach() %dopar%. I feel that I am close to getting it working properly though. Thank you!

HenrikBengtsson commented 1 year ago

now I have registered the cluster properly on both platforms, I am getting an error with foreach() %dopar%

[...] on MS Windows, I assume.

I think this is where I should suggest that you switch to using Futureverse instead, e.g. switch to doFuture and foreach(...) %dofuture% { ... }. See https://dofuture.futureverse.org/#alternative-2-dofuture. It's designed to make your and user's life much easier, more robust, and more powerful.

quantixed commented 1 year ago

[...] on MS Windows, I assume.

Yes, foreach(...) %do% {...} works but foreach(...) %dopar% {...} doesn't. Mac is working fine. Windows gives:

Error in unserialize(socklist[[n]]) : error reading from connection
Error in serialize(data, node$con) : error writing to connection

I tried %dofuture% it also exits with an error which is at least more verbose (thank you @HenrikBengtsson !)

Error in unserialize(node$con) :
MultisessionFuture (doFuture2-1) failed to receive results from cluster RichSOCKnode #1 (PID 20064 on localhost ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: No process exists with this PID, i.e. the localhost worker is no longer alive. Detected a non-exportable reference (‘externalptr’ of class ‘XMLInternalElementNode’) in one of the globals (‘subdoc’ of class ‘XMLNodeSet’) used in the future expression. The total size of the 3 globals exported is 10.28 MiB. There are three globals: ‘subdoc’ (10.28 MiB of class ‘list’), ‘attrName’ (1.79 KiB of class ‘character’) and ‘...future.x_ii’ (168 bytes of class ‘list’)

I don't see how the non-exportable reference could cause the connection(s) to die because it all runs fine on Mac.

quantixed commented 1 year ago

Closing because the open connections problem is fixed. The problem of parallelisation on Windows is now issue #11

quantixed commented 1 year ago

@rosy-li this issue is now fixed. If you update the package using devtools::install_github("quantixed/TrackMateR") it should now work as expected.

rosy-li commented 1 year ago

Great. Thanks so much for your help!!!

Best, Rosy

From: Stephen Royle @.> Date: Wednesday, April 12, 2023 at 5:38 AM To: quantixed/TrackMateR @.> Cc: rosy-li @.>, Mention @.> Subject: Re: [quantixed/TrackMateR] Cannot Open the Connections (Issue #9)

@rosy-lihttps://github.com/rosy-li this issue is now fixed. If you update the package using devtools::install_github("quantixed/TrackMateR") it should now work as expected.

— Reply to this email directly, view it on GitHubhttps://github.com/quantixed/TrackMateR/issues/9#issuecomment-1504969930, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQYPXT63A5MYKQFBERPRGPDXAZZZ7ANCNFSM6AAAAAAWUF4MO4. You are receiving this because you were mentioned.Message ID: @.***>

rosy-li commented 1 year ago

Thanks so much! I tried the updated code and it works very well on my computer!