Open RubenVanEsch opened 5 months ago
@RubenVanEsch Can you provide a more minimally reproducible example?
For example, paring down a bit the above to just:
import scanpy as sc
em_adata = sc.datasets.pbmc3k()
sc.pp.pca(em_adata, n_comps=50)
sc.pp.neighbors(em_adata)
sc.tl.umap(em_adata)
sc.tl.leiden(em_adata,flavor='igraph',n_iterations=2,random_state=1653,directed=False)
does not yield any error. Could you share your system info i.e., widows or mac?
@ilan-gold your minimal example causes the exact same error:
Exception ignored in: <class 'ValueError'> Traceback (most recent call last): File "numpy\random\mtrand.pyx", line 780, in numpy.random.mtrand.RandomState.randint File "numpy\random\_bounded_integers.pyx", line 2881, in numpy.random._bounded_integers._rand_int32 ValueError: high is out of bounds for int32
if you are curious, it spits the error out 14.210 times (71050 lines of error message)
EDIT: the random state does not seem to matter btw, also happens with different random states
Ok @RubenVanEsch we have to assume that this is a windows problem then. I think we will try to set up a test job and hopefully this catches the problem, although will likely catch others. What happens without a random_state
set?
@ilan-gold same thing without random state I think there might be some windows error relating to numpy on linux defaulting to 64 bit integer vs windows sometimes defaulting to 32 bit (those were the first couple of google hits when i searched the error). though i dont know where the seed is generated in the source code though.
@RubenVanEsch Yes, and the issue there is that we're not the ones calling randint
. We may be able to hack it. I'll have a look at how the pipeline errors out on our CI to maybe see where the call is coming from.
If the problem is windows, it's possible it will be solved by numpy 2.0. Not sure how easy the upgrade path to numpy 2.0 will be, however.
I got the test runner to do windows and while there were other errors, this one was seemingly not present: https://dev.azure.com/scverse/scanpy/_build/results?buildId=6287&view=logs&j=4eb20215-89fc-58e4-6218-2c2fa88ddf72&t=482e4b16-75d9-5f8c-9594-aadcd098d2cb&l=3977
We have a test that is strikingly similar to the more minimal example from above: https://github.com/scverse/scanpy/blob/main/scanpy/tests/notebooks/test_pbmc3k.py minus the umap. Could you try this test (which doesn't call umap
) and also try it with umap
so it's exactly as our little demo and let us know what you get? We also set resolution
in the test. This test seems to actually pass on our CI.
In general there will be some back and forth here until we find someone near us with a windows machine since using CI to fix this problem isn't really feasible, but at least we can narrow the scope.
@RubenVanEsch, are you able to run this in WSL? Also, does the number you pass for random seed matter?
Also, does the number you pass for random seed matter?
From @RubenVanEsch :
EDIT: the random state does not seem to matter btw, also happens with different random states
@ivirshup @ilan-gold just got back to this, thought i could not install wsl as I am on a somewhat company restricted laptop, but turns out i can. installing it now (and probably using that from here on out). will run the tester in a bit and let you know
import scanpy as sc em_adata = sc.datasets.pbmc3k() sc.pp.pca(em_adata, n_comps=50) sc.pp.neighbors(em_adata) sc.tl.umap(em_adata) sc.tl.leiden(em_adata,flavor='igraph',n_iterations=2,random_state=1653,directed=False)
@melonora, would you mind running this on your windows machine with the latest scanpy release to see if you can reproduce it?
Yes I will and report back. Most likely in the evening.
I can reproduce, this is the error that I get:
From a first glance it seems like the default for randint is used which is int32
. I can check whether switching to int64
fixes the issue.
I will see if I can reproduce on main and pinpoint where the problem arises.
Do you guys still want me to try and run the test from @ilan-gold ? Or is it fine now that it is reproduced on your side as well?
It is reproduced. It is due to the randint
producing a value outside the range of the default dtype int32
. On windows 64 bit systems the default is int32
despite the system being 64 bit. This is due to default for c long being int32
on these systems.
The part of the code that fails due to this is when using the context manager to perform the leiden clustering with igraph flavor.
In particular here is the piece of code: https://github.com/scverse/scanpy/blob/a33111f3b2caaa4ee5e33d02b6e98b143023341b/scanpy/tools/_leiden.py#L184-L185
Though the randint
is called from within c code within igraph
itself. @ivirshup, do you think asking for calling with dtype int64
would be a problem until this part is fixed on the numpy side?
Where would you put the dtype=int64
argument?
It wouldn't be on our side. As far as I know the numpy random number generator is called from within c code within igraph itself.
Since we can’t test this without your help, could you check if passing your own RNG here makes it work?
I can test tomorrow
I can reproduce this bug on my machine as well. I can supply additional information or context if needed, and I can test fixes
If the problem is windows, it's possible it will be solved by numpy 2.0. Not sure how easy the upgrade path to numpy 2.0 will be, however.
I can reproduce the error using Numpy 2.0.2.
@patrick-nicodemus What we need more than anything is someone to test out a fix and to confirm that using wsl
prevents the problem.
See https://github.com/scverse/scanpy/pull/3041
The issue is that we don't have windows machines.
@ilan-gold If you want to try it out, I give instructions for how to reproduce the error with a Docker container for Windows in the cross-referenced issue. I also have tried it on WSL, and the problem is not present on WSL, so this is a workaround for Windows users. However, I am organizing a Python workshop in a few weeks, and I think it would add some additional administrative burden/overhead to the workshop to coordinate installing and setting up WSL (as we see in #3041, Ruben had trouble installing WSL and others might as well.) So, for me, using WSL is a suboptimal workaround.
If you want to try it out, I give instructions for how to reproduce the error with a Docker container for Windows in the cross-referenced issue
Yes please. I’m confused how Windows comes into play though since I thougt that Docker always runs on a Linux kernel – natively on Linux and in a VM on macOS and Windows.
Yes, this was my impression too. However there is a documented option "Switch to Windows containers" which is available if you right click on the Docker icon in the taskbar and this allows one to run vms using a Windows kernel.
On Fri, Sep 6, 2024, 3:36 AM Philipp A. @.***> wrote:
If you want to try it out, I give instructions for how to reproduce the error with a Docker container for Windows in the cross-referenced issue
Yes please. I’m confused how Windows comes into play though since I thougt that Docker always runs on a Linux kernel – natively on Linux and in a VM on macOS and Windows.
— Reply to this email directly, view it on GitHub https://github.com/scverse/scanpy/issues/2969#issuecomment-2333436219, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH2OS47KNFAVTYUHGAMORILZVFLRXAVCNFSM6AAAAABFM3NQROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZTGQZTMMRRHE . You are receiving this because you were mentioned.Message ID: @.***>
Please make sure these conditions are met
What happened?
was running the standard pipeline on some data and when i run
sc.tl.leiden(em_adata,flavor='igraph',n_iterations=2,random_state=1653,directed=False)
it spits out infinite lines of ignored exceptions. it does not actually crash the kernel, but does bog it down and causes everything to to take much more time than necesarry. I am working in a conda env on a Win 10 , 64bit, x64 system the problem also occurs using the pbmc3k datasetMinimal code sample
Error output
Versions