ur-whitelab / BO-LIFT

BayesOpt + LIFT
65 stars 12 forks source link

ValueError: Expected n_neighbors <= n_samples when telling to AskTellGPR model #18

Closed n-yoshikawa closed 11 months ago

n-yoshikawa commented 1 year ago

Thank you for developing interesting software!

I wanted to test AskTellGPR() functionality, but I encountered an error ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6.

Code to reproduce

import bolift

asktell = bolift.AskTellGPR()
asktell.tell("1-bromopropane", -1.730)

Result

$ python reproduce.py
Cached embeddings not found. Creating new cache table.
Traceback (most recent call last):
  File "reproduce.py", line 4, in <module>
    asktell.tell("1-bromopropane", -1.730)
  File "/home/xxxxxx/.local/lib/python3.8/site-packages/bolift/asktellGPR.py", line 138, in tell
    self._train(
  File "/home/xxxxxx/.local/lib/python3.8/site-packages/bolift/asktellGPR.py", line 99, in _train
    embedding_isomap = self.isomap.fit_transform(embedding)
  File "/home/xxxxxx/.local/lib/python3.8/site-packages/sklearn/manifold/_isomap.py", line 324, in fit_transform
    self._fit_transform(X)
  File "/home/xxxxxx/.local/lib/python3.8/site-packages/sklearn/manifold/_isomap.py", line 205, in _fit_transform
    kng = kneighbors_graph(
  File "/home/xxxxxx/.local/lib/python3.8/site-packages/sklearn/neighbors/_graph.py", line 125, in kneighbors_graph
    return X.kneighbors_graph(X=query, n_neighbors=n_neighbors, mode=mode)
  File "/home/xxxxxx/.local/lib/python3.8/site-packages/sklearn/neighbors/_base.py", line 886, in kneighbors_graph
    A_data, A_ind = self.kneighbors(X, n_neighbors, return_distance=True)
  File "/home/xxxxxx/.local/lib/python3.8/site-packages/sklearn/neighbors/_base.py", line 727, in kneighbors
    raise ValueError(
ValueError: Expected n_neighbors <= n_samples,  but n_samples = 1, n_neighbors = 6

A similar error also occurred when running CORxn.ipynb.

Software version

I would appreciate it if you could provide any information to resolve this error. Thank you.

whitead commented 1 year ago

@smichtavy I think wrote the GPR code - can you take a look?

maykcaldas commented 1 year ago

Hello, @n-yoshikawa! Thanks for opening this issue.

We use Isomap to reduce dimensionality in the GPR. When creating the AskTellGPR object, you can pass a pool object for fitting the isomap using this pool.

When this pool isn't provided, the class trains the isomap when we use the tell method. The problem here is that your model is trying to fit the isomap using only one point (all the points it knows). You can explicitly avoid training while telling the first few points.

xs=['1-bromopropane', '1-bromopentane', '1-bromooctane', '1-bromonaphthalene', '1,4-dinitrobenzene']
ys=[-1.730, -3.080, -5.060, -4.35, -3.390]

for x,y in zip(xs, ys):
  asktell.tell(x,y, train=False)

This will let you pass some points to be able to fit the isomap. And then use the model normally:

asktell.tell('penta-1,4-diene', -2.090)
asktell.predict('1-bromohexane')

Can you let me know if that works for you? Thanks!

whitead commented 1 year ago

@maykcaldas - great explanation. Can we add a better exception for this problem?

n-yoshikawa commented 1 year ago

Hello, @maykcaldas! Your code worked in my environment. Thank you so much for your answer!

The mentioned behavior seems tricky to me as it is different from the example for AskTellFewShotTopk() in readme and train=False was not used in the example notebook. Also, I am still confused about the correspondence between the model names in this library and the names on the paper. I would appreciate it if you could extend the documentation about the differences between implemented models.

maykcaldas commented 11 months ago

We added an exception message for that case explaining how to correctly address this issue.