Closed RakshitKhajuria closed 4 months ago
Am i doing this correctly
corpus_tokens = bm25s.tokenize([item['chunk'] for item in json_data])
retriever = bm25s.BM25()
retriever.index(corpus_tokens)
query = "mountain cycling"
query_tokens = bm25s.tokenize(query)
# Perform the retrieval
results, scores = retriever.retrieve(query_tokens, k=100)
print("Results:", results)
print("Results:", scores)
I was able to do it closing this.
To answer your original question, bm25s does not provide utility for indexing json files. However, the built-in json
library should be good for what you have in mind.
To answer your original question, bm25s does not provide utility for indexing json files. However, the built-in
json
library should be good for what you have in mind.
Thank you for for replying. I was able to get the results. 😊
Hi I am considering using the BM25 library for a project where I need to efficiently retrieve JSON records based on textual content matches. My data is structured in JSON format, each with several fields.
Use Case
When I input a query, such as "mountain cycling", I want to retrieve the top K JSON records that best match this query based on the content of the 'chunk' field.
Example of json
Questions
Does the BM25 library support indexing and retrieving directly from JSON structures like the ones provided above, particularly focusing on a specific field for text matching?
Setup Advice: If direct JSON handling is supported, could you provide guidance or documentation on how to set up the library for this specific use case?