ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.26k stars 5.63k forks source link

Ray + pynndescent (numba) compatibility issue #44714

Closed ruochiz closed 2 months ago

ruochiz commented 5 months ago

What happened + What you expected to happen

when ray is imported and init before running umap, it triggers the numba error:

But if umap is imported and runned first, then import ray and init, it won't. Looks like it's due to compiling numba jit functions after ray initialization.

This bug is reproducible on ray versions >= 2.5.0, but ray <= 2.4.0 it's fine.

2024-04-11 22:36:29.706906: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-11 22:36:29.744354: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-11 22:36:30.511499: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
---------------------------------------------------------------------------
TypingError                               Traceback (most recent call last)
Cell In[14], line 2
      1 from umap import *
----> 2 vec = UMAP(metric='cosine', low_memory=False).fit_transform(np.array(xx).astype('float32'))

File ~/miniforge3/envs/scprinter/lib/python3.10/site-packages/umap/umap_.py:2887, in UMAP.fit_transform(self, X, y, force_all_finite)
   2851 def fit_transform(self, X, y=None, force_all_finite=True):
   2852     """Fit X into an embedded space and return that transformed
   2853     output.
   2854 
   (...)
   2885         Local radii of data points in the embedding (log-transformed).
   2886     """
-> 2887     self.fit(X, y, force_all_finite)
   2888     if self.transform_mode == "embedding":
   2889         if self.output_dens:

File ~/miniforge3/envs/scprinter/lib/python3.10/site-packages/umap/umap_.py:2608, in UMAP.fit(self, X, y, force_all_finite)
   2602     nn_metric = self._input_distance_func
   2603 if self.knn_dists is None:
   2604     (
   2605         self._knn_indices,
   2606         self._knn_dists,
   2607         self._knn_search_index,
-> 2608     ) = nearest_neighbors(
   2609         X[index],
   2610         self._n_neighbors,
   2611         nn_metric,
   2612         self._metric_kwds,
   2613         self.angular_rp_forest,
   2614         random_state,
   2615         self.low_memory,
   2616         use_pynndescent=True,
   2617         n_jobs=self.n_jobs,
   2618         verbose=self.verbose,
   2619     )
   2620 else:
   2621     self._knn_indices = self.knn_indices

File ~/miniforge3/envs/scprinter/lib/python3.10/site-packages/umap/umap_.py:329, in nearest_neighbors(X, n_neighbors, metric, metric_kwds, angular, random_state, low_memory, use_pynndescent, n_jobs, verbose)
    326     n_trees = min(64, 5 + int(round((X.shape[0]) ** 0.5 / 20.0)))
    327     n_iters = max(5, int(round(np.log2(X.shape[0]))))
--> 329     knn_search_index = NNDescent(
    330         X,
    331         n_neighbors=n_neighbors,
    332         metric=metric,
    333         metric_kwds=metric_kwds,
    334         random_state=random_state,
    335         n_trees=n_trees,
    336         n_iters=n_iters,
    337         max_candidates=60,
    338         low_memory=low_memory,
    339         n_jobs=n_jobs,
    340         verbose=verbose,
    341         compressed=False,
    342     )
    343     knn_indices, knn_dists = knn_search_index.neighbor_graph
    345 if verbose:

File ~/miniforge3/envs/scprinter/lib/python3.10/site-packages/pynndescent/pynndescent_.py:946, in NNDescent.__init__(self, data, metric, metric_kwds, n_neighbors, n_trees, leaf_size, pruning_degree_multiplier, diversify_prob, n_search_trees, tree_init, init_graph, init_dist, random_state, low_memory, max_candidates, max_rptree_depth, n_iters, delta, n_jobs, compressed, parallel_batch_queries, verbose)
    943     if verbose:
    944         print(ts(), "NN descent for", str(n_iters), "iterations")
--> 946     self._neighbor_graph = nn_descent(
    947         self._raw_data,
    948         self.n_neighbors,
    949         self.rng_state,
    950         effective_max_candidates,
    951         self._distance_func,
    952         self.n_iters,
    953         self.delta,
    954         low_memory=self.low_memory,
    955         rp_tree_init=True,
    956         init_graph=_init_graph,
    957         leaf_array=leaf_array,
    958         verbose=verbose,
    959     )
    961 if np.any(self._neighbor_graph[0] < 0):
    962     warn(
    963         "Failed to correctly find n_neighbors for some samples."
    964         " Results may be less than ideal. Try re-running with"
    965         " different parameters."
    966     )

File ~/miniforge3/envs/scprinter/lib/python3.10/site-packages/numba/core/dispatcher.py:468, in _DispatcherBase._compile_for_args(self, *args, **kws)
    464         msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
    465                f"by the following argument(s):\n{args_str}\n")
    466         e.patch_message(msg)
--> 468     error_rewrite(e, 'typing')
    469 except errors.UnsupportedError as e:
    470     # Something unsupported is present in the user code, add help info
    471     error_rewrite(e, 'unsupported_error')

File ~/miniforge3/envs/scprinter/lib/python3.10/site-packages/numba/core/dispatcher.py:409, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
    407     raise e
    408 else:
--> 409     raise e.with_traceback(None)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'print': Cannot determine Numba type of <class 'function'>

File "../../../home/rzhang/miniforge3/envs/scprinter/lib/python3.10/site-packages/pynndescent/pynndescent_.py", line 253:
def nn_descent_internal_low_memory_parallel(
    <source elided>
        if verbose:
            print("\t", n + 1, " / ", n_iters)
            ^

During: resolving callee type: type(CPUDispatcher(<function nn_descent_internal_low_memory_parallel at 0x7f3dcab12560>))
During: typing of call at /home/rzhang/miniforge3/envs/scprinter/lib/python3.10/site-packages/pynndescent/pynndescent_.py (359)

During: resolving callee type: type(CPUDispatcher(<function nn_descent_internal_low_memory_parallel at 0x7f3dcab12560>))
During: typing of call at /home/rzhang/miniforge3/envs/scprinter/lib/python3.10/site-packages/pynndescent/pynndescent_.py (359)

File "../../../home/rzhang/miniforge3/envs/scprinter/lib/python3.10/site-packages/pynndescent/pynndescent_.py", line 359:
def nn_descent(
    <source elided>
    if low_memory:
        nn_descent_internal_low_memory_parallel(
        ^

Versions / Dependencies

Ray >= 2.5.0 numba any version >= 0.57 pynndescent/umap latest

Reproduction script

import ray ray.init() from umap import UMAP import numpy as np xx = np.random.random(10000*100).reshape((10000, 100)) vec = UMAP(metric='cosine', low_memory=False).fit_transform(np.array(xx).astype('float32'))

Issue Severity

High: It blocks me from completing my task.

jjyao commented 5 months ago

But if umap is imported and runned first, then import ray and init, it won't.

@ruochiz Seems you have a workaround to unblock youself, is it right?

ruochiz commented 5 months ago

Not really.. Um, this is a temporary fix that if I know the size of the matrix that pynndescent will use. But pynndescent seems to have different route of computation graph depending on the size of the matrices, so sometimes the same error would happen again.

rynewang commented 4 months ago

Root caused here: https://github.com/ray-project/ray/issues/45538

tldr: you can set this env var RAY_TQDM_PATCH_PRINT=0 to workaround

ruochiz commented 4 months ago

I'll give it a try, thank you for looking into this!

YiranJing commented 4 months ago

I confirm that the workaround solution works for me! 🙌 Thanks Ruiyang!