scvelo() "cannot open connection" #32

Open Laura19993 opened 3 years ago

Laura19993 commented 3 years ago

When I try to run the scvelo() function I get the following error message (see attached picture). I am pretty new to R and Python and would appreciate any help to solve that issue. grafik

LTLA commented 3 years ago

From the warnings, one of the calls has a 137 status, and this is ultimately causing the error. Some Googling suggests that this occurs when a Docker container runs out of memory, though I don't know enough about your system to know if this is applicable.

rrydbirk commented 3 years ago

I get the same error on a machine with 128 cores and 2 TB memory.

LTLA commented 3 years ago

Hm. I don't know why this happens. Suggest trying to run the minimal example in a fresh R session.

# Copy-pasted from the error message above. Note the escape of the double quotes.
act.cmd <- ". '/d0/home/rasmusr/.cache/basilisk/1.2.1/0/etc/profile.d/' && conda activate && /usr/lib/R/bin/Rscript --default-packages=NULL -e \"con <- socketConnection(port=11552, open='wb', blocking=TRUE);serialize(Sys.getenv(), con);close(con)\""

# Taken from the port above.
p <- 11552 

soc <- serverSocket(p)
system(act.cmd, intern = TRUE)
listener <- socketAccept(soc, blocking = TRUE, open = "a+b")
activated <- unserialize(listener)

Turning off intern=TRUE might also provide some more information w.r.t. an error message.

Laura19993 commented 3 years ago

Thanks for your suggestions. I ran it in a fresh R session using the paths from my error message. And I still get the same error "status 137".

#copy-pasted from my error message. Note the escape of the double quotes.
act.cmd <- ". '/d0/home/lwolbeck/.cache/basilisk/1.2.1/0/etc/profile.d/' && conda activate '/d0/home/lwolbeck/.cache/basilisk/1.2.1/velociraptor-1.0.0/env' && /usr/local/R/R-4.0.3/lib/R/bin/Rscript --default-packages=NULL -e \"con <- socketConnection(port=11656, open='wb', blocking=TRUE);serialize(Sys.getenv(), con);close(con)\""
#Taken from the port above
p <- 11656  
soc <- serverSocket(p)
system(act.cmd, intern = TRUE)
[1] 137
Warning message:
In system(act.cmd, intern = TRUE) :
running command '. '/d0/home/lwolbeck/.cache/basilisk/1.2.1/0/etc/profile.d/' && conda activate '/d0/home/lwolbeck/.cache/basilisk/1.2.1/velociraptor-1.0.0/env' && /usr/local/R/R-4.0.3/lib/R/bin/Rscript --default-packages=NULL -e "con <- socketConnection(port=11656, open='wb', blocking=TRUE);serialize(Sys.getenv(), con);close(con)"' had status 137

listener <- socketAccept(soc, blocking = TRUE, open = "a+b")
Error in socketAccept(soc, blocking = TRUE, open = "a+b") : 
cannot open the connection
In addition: Warning message:
In socketAccept(soc, blocking = TRUE, open = "a+b") :
  problem in accepting connections on this socket

activated <- unserialize(listener)
Error in unserialize(listener) : object 'listener' not found
LTLA commented 3 years ago

Do you get more informative error messages if you remove intern=TRUE in the system() call?

Edit: After some more Googling, it turns out that 137 is a combination of the status codes 128 (fatal signal) and 9 (sigkill), see docs here. This is consistent with an examination of your error logs - see the Killed message above - which indicates that some external process is killing the system call. This is usually caused by resource controllers on clusters, cloud, etc. - an event like this would be fairly typical on the cluster at my workplace where the memory limits requested by each job are strictly enforced.

Edit 2: I would be curious to see what happens for the simpler:

act.cmd <- ". '/d0/home/rasmusr/.cache/basilisk/1.2.1/0/etc/profile.d/' && conda activate"
system(act.cmd, intern = TRUE)

Or even just:

act.cmd <- ". '/d0/home/rasmusr/.cache/basilisk/1.2.1/0/etc/profile.d/'"
system(act.cmd, intern = TRUE)
Laura19993 commented 3 years ago

If I remove intern=TRUE I receive a shorter error message, it only says Killed then

For the two other calls you suggested I get character(0) as a respond

LTLA commented 3 years ago

I have no idea what's going on here. My best guess is that your system is configured to kill any process that tries to connect to a port. While I could add a workaround... I don't want to, unless I get some more information about what's going wrong. My workaround won't solve the underlying port issue, which seems like it would break any socket-based parallelization via parallel.

kevinrue commented 3 years ago

@shijianasdf Please open a new GitHub issue for your problem. There is no guarantee - and it is rather unlikely - that your problem is related to this one.

As a general rule, if your situation is not obviously the same as an existing one, it is always better to create a new issue, so that users and developers can track and communicate on each issue separately.

Thank you for your advice