patterns-ai-core / langchainrb

Build LLM-powered applications in Ruby
https://rubydoc.info/gems/langchainrb
MIT License
1.18k stars 156 forks source link

Incorrect documentation for Hnswlib `similarity_search` and `similarity_search_by_vector` return format #668

Closed yash-learner closed 1 week ago

yash-learner commented 1 week ago

Describe the bug The documentation for the similarity_search and similarity_search_by_vector methods in the Langchain::Vectorsearch::Hnswlib class incorrectly states that the methods return results in the format [[id1, distance1], [id2, distance2]]. However, the actual return format from the underlying hnswlib library is [[id1, id2], [distance1, distance2]], where IDs and distances are provided as separate arrays.

https://github.com/patterns-ai-core/langchainrb/blob/656ca14c6294bcb1e06e4e691bd0132781d22d67/lib/langchain/vectorsearch/hnswlib.rb#L80

https://github.com/patterns-ai-core/langchainrb/blob/656ca14c6294bcb1e06e4e691bd0132781d22d67/lib/langchain/vectorsearch/hnswlib.rb#L61

Return type mentioned in Hnswlib gem

https://github.com/yoshoku/hnswlib.rb/blob/7b85e43f4542a60758d63027ed09dd6be80ea6f4/ext/hnswlib/dummy.rb#L222

    # @return [Array<Array<Integer>, Array<Float>>]

To Reproduce

  1. Configure a Langchain::Vectorsearch::Hnswlib instance.
  2. Populate the vector store with data.
  3. Execute a similarity search using the similarity_search method.
  4. Observe the returned result format.
[[3, 4], [0.5338442325592041, 0.5602526664733887]]

Expected behavior The documentation should accurately reflect the return format as [[id1, id2], [distance1, distance2]] to match the actual behavior of the hnswlib library's search_knn method.