lmcinnes / umap

Uniform Manifold Approximation and Projection
BSD 3-Clause "New" or "Revised" License
7.46k stars 808 forks source link

Pickle issue with load_ParametricUMAP #1134

Open eafpres opened 5 months ago

eafpres commented 5 months ago

Describe the bug

Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:   File "/var/app/current/application.py", line 478, in load_stuff
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:     model = load_ParametricUMAP(model_set + '/' + full_name,
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:   File "/home/user/mambaforge/envs/tensorml/lib/python3.11/site-packages/umap/parametric>
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:     model = pickle.load((open(model_output, "rb")))
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:   File "/home/user/mambaforge/envs/tensorml/lib/python3.11/site-packages/numba/core/seri>
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:     ctor, states = loads(serialized)
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:                    ^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: TypeError: code() argument 13 must be str, not int

To Reproduce Steps to reproduce the behavior: ubuntu 20.04 Python 3.11 umap-learn==0.5.3

1) create an embedding:

  distance = 'sokalsneath'
  op_mix_ratio = 0.3
  embed_dim = 10
  reducer = umap.ParametricUMAP(random_state = 42,
                                transform_seed = 42,
                                n_neighbors = 15,
                                n_epochs = 500,
                                metric = distance,
                                min_dist = 0.0,
                                set_op_mix_ratio = op_mix_ratio,
                                n_components = embed_dim)
  mapper = reducer.fit(model_vectors)
  mapper.save(data_path + '/' + date_prefix + '/' +
              date_prefix + '_umap_mapper.umap')

2) attempt to load the model on a different linux machine using load_ParametricUMAP() 3)

Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:   File "/var/app/current/application.py", line 478, in load_stuff
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:     model = load_ParametricUMAP(model_set + '/' + full_name,
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:   File "/home/user/mambaforge/envs/tensorml/lib/python3.11/site-packages/umap/parametric>
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:     model = pickle.load((open(model_output, "rb")))
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:   File "/home/user/mambaforge/envs/tensorml/lib/python3.11/site-packages/numba/core/seri>
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:     ctor, states = loads(serialized)
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]:                    ^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: TypeError: code() argument 13 must be str, not int

Expected behavior On another machine this worked. I believe it is a subtle pickle issue. I had issues with other pickle files, which was solved by using pickle.dump(object, open(filename), protocol = 2). I have not figured out how to get umap to use the protocol.

Desktop (please complete the following information):

eafpres commented 5 months ago

Update--this may be a Python3.11-related issue. I have tested downgrading the server to Python3.9 and things seem too work then. I did try loading Python3.11 on my dev system and re-saving the model, but still got the error on the Python3.11 server.

timsainb commented 5 months ago

hey, can you try this branch to see if it resolves the issue on python 3.11? https://github.com/lmcinnes/umap/pull/1123

kobiche commented 2 months ago

I can confirm this is related to the python version. How should I proceed?

rantoniuk commented 1 month ago

@timsainb I can see the #1123 has conflicts to be resolved. Is this in a shape that I could use for building a custom version to see if it solves the issue or do you want to rebase first?

timsainb commented 1 month ago

We are just about to pull in an updated version of Parametric UMAP https://github.com/lmcinnes/umap/pull/1153 so my plan is to wait till that is pulled in to integrate #1123