lmcinnes / umap

Uniform Manifold Approximation and Projection
BSD 3-Clause "New" or "Revised" License
7.39k stars 803 forks source link

Semi-deterministic output even though randon_state is set #1108

Open SleepyMorpheus opened 6 months ago

SleepyMorpheus commented 6 months ago

Hello everybody, While adding some tests to a project of mine, I noticed some really weird behaviour. Two different instances initialised with the same parameters (including random_state) output a different result for fit_transform during an execution. But when running the program again, the output does not change.

Am I missing something obvious? Or has anybody an idea why this is happening. Thanks for looking into it.

Reproduction Steps

import umap

din = [[39.715797424316406, 5.328598499298096],
                      [40.119140625, 6.10653018951416],
                      [39.6290283203125, 6.134637832641602],
                      [39.19687271118164, 5.85951566696167],
                      [9.60939884185791, 9.586419105529785],
                      [-6.015710353851318, -11.25406265258789],
                      [9.012431144714355, 8.989534378051758],
                      [9.283456802368164, 9.261088371276855],
                      [-5.681527614593506, -10.919998168945312],
                      [-5.479494571685791, -10.71765422821045]]

a = umap.UMAP(random_state=42, n_neighbors=2, n_components=2).fit_transform(din).tolist()
b = umap.UMAP(random_state=42, n_neighbors=2, n_components=2).fit_transform(din).tolist()

print(a)
print(b)

assert a == b

with the output being:

/Users/op/.pyenv/versions/3.9.18/lib/python3.9/site-packages/umap/umap_.py:1945: UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.
  warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.")
/Users/op/.pyenv/versions/3.9.18/lib/python3.9/site-packages/umap/umap_.py:1945: UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.
  warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.")
[[20.164154052734375, 1.3494281768798828], [21.097431182861328, 0.2964009642601013], [20.777090072631836, 0.6684482097625732], [20.44692611694336, 1.0685573816299438], [11.67434310913086, 17.12160301208496], [-3.4501354694366455, 15.270648002624512], [11.07783031463623, 17.718942642211914], [11.350298881530762, 17.448848724365234], [-3.1171295642852783, 15.60583209991455], [-2.912529230117798, 15.80562973022461]]
[[4.337563514709473, 8.263677597045898], [3.3291709423065186, 7.28176212310791], [3.681276798248291, 7.623903751373291], [4.056679725646973, 7.981446743011475], [-2.7793023586273193, 16.567930221557617], [8.226690292358398, -1.6328837871551514], [-3.3761720657348633, 17.164905548095703], [-3.1042020320892334, 16.89451026916504], [7.8915557861328125, -1.2999401092529297], [7.691712379455566, -1.0954537391662598]]
Traceback (most recent call last):
  File "/Users/op/Documents/ETHZ/IVIA/umap-test/umap-lol.py", line 21, in <module>
    assert a == b
AssertionError

Versions

joblib==1.3.2
llvmlite==0.42.0
numba==0.59.1
numpy==1.26.4
pynndescent==0.5.12
scikit-learn==1.4.1.post1
scipy==1.12.0
threadpoolctl==3.4.0
tqdm==4.66.2
umap-learn==0.5.6

Note that umap is directly installed from github but behaviour stays the same if installed via pypi.

OmniaZayed commented 4 months ago

Hi, I get the same warning. Any ideas on resolving it?

hongyeting commented 3 months ago

i met this problem in my project recently and the warning's exactly the same: n_jobs value 1 overridden to 1 issue #1081 seems to resolve this! try this

a = umap.UMAP(random_state=42, n_jobs=1, n_neighbors=2, n_components=2).fit_transform(din)

it runs with no warning but i don't really understand why... n_jobs value 1 overridden to 1? so the warning means that the default njobs has some problems, the original value 1 had a wrong class or something? search the warning in the original [code link](https://github.com/lmcinnes/umap/blob/master/umap/umap.py) and it says

if self.n_jobs != 1 and self.random_state is not None:
self.n_jobs = 1
warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.") 

the warning changed the problem parameter before reporting it... then i wonder how it set the default value, and i find n_jobs=-1 in initial function, and nowhere tend to change it after all i find it interesting, i'm having a look at the umap paper 2018 densMAP paper 2021