Error messages from Faiss swallowed

richjames0 commented 1 year ago

Uninformative error message provided when the number of data points is too low to train an index (and likely other times). We need to see the full exception details from Faiss.

Repro example in raw Faiss:

import numpy as np
import faiss
import os
import math
toy_data = np.array([[1.5, 5.5, 4.5, 6.4]])
db_vectors = toy_data.astype('float32').copy(order="C")
n,dimension = db_vectors.shape
code_size = 4
nlist = 2
quantizer = faiss.IndexFlatL2(dimension) 
index_pq = faiss.IndexIVFPQ(quantizer, dimension, nlist, code_size, 8)
index_pq.train(db_vectors)  # train on the database vectors

Error from raw Faiss:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [1], line 12
     10 quantizer = faiss.IndexFlatL2(dimension)  
     11 index_pq = faiss.IndexIVFPQ(quantizer, dimension, nlist, code_size, 8) 
---> 12 index_pq.train(db_vectors)

File ~/.conda/envs/side/lib/python3.8/site-packages/faiss/__init__.py:280, in handle_Index.<locals>.replacement_train(self, x)
    278 n, d = x.shape
    279 assert d == self.d
--> 280 self.train_c(n, swig_ptr(x))

File ~/.conda/envs/side/lib/python3.8/site-packages/faiss/swigfaiss.py:5104, in IndexIVF.train(self, n, x)
   5102 def train(self, n, x):
   5103     r""" Trains the quantizer and calls train_residual to train sub-quantizers"""
-> 5104     return _swigfaiss.IndexIVF_train(self, n, x)

RuntimeError: Error in void faiss::Clustering::train_encoded(faiss::Clustering::idx_t, const uint8_t*, const faiss::Index*, faiss::Index&, const float*) at /project/faiss/faiss/Clustering.cpp:283: Error: 'nx >= k' failed: Number of training points (1) should be at least as large as number of clusters (2)

Repro in OpenSearch

PUT /train-index-mer
{
  "settings" : {
    "number_of_shards" : 3,
    "number_of_replicas" : 1
  },
  "mappings": {
       "properties": {
       "train-field": {
           "type": "knn_vector",
           "dimension": 4
      }
   }
  }
}

POST _bulk
{ "index": { "_index": "train-index-mer", "_id": "1" } }
{ "train-field": [1.5, 5.5, 4.5, 6.4]}

POST /_plugins/_knn/models/my-model-today1/_train
{
  "training_index": "train-index-mer",
  "training_field": "train-field",
  "dimension": 4,
   "method": {
    "name": "ivf",
    "engine": "faiss",
    "parameters": {
      "nlist":2,
      "encoder": {
        "name": "pq",
        "parameters": {
          "code_size": 4,
          "m": 8
        }
      }
    }
  }
}

Error from OpenSearch

{
  "model_id" : "model-issue1",
  "model_blob" : "",
  "state" : "failed",
  "timestamp" : "2022-10-25T18:07:01.069292Z",
  "description" : "",
  "error" : "Failed to execute training. May be caused by an invalid method definition or not enough memory to perform training.",
  "space_type" : "l2",
  "dimension" : 4,
  "engine" : "faiss"
}

jmazanec15 commented 1 year ago

@richjames0 this makes sense. We originally provided the full error message in OpenSearch. However, we decided to create more generic message to avoid returning system information about the node in the error:

/project/faiss/faiss/Clustering.cpp:283:

so right now, we just log it. But, I think we can do a better job of returning this info to the user.

richjames0 commented 1 year ago

Ah yes that makes sense. That would be fantastic. In the meantime do we have access to those logs on our side?

opensearch-project / k-NN

Error messages from Faiss swallowed #593

Uninformative error message provided when the number of data points is too low to train an index (and likely other times). We need to see the full exception details from Faiss.