Open pandu-k opened 1 year ago
I think a unique set of values from a field would be useful too. For example: doc1 tags: [red, blue] doc2 tags: [blue] doc3 tags: [yellow, blue]
mq.index.docs.tags().unique() -> [red, yellow, blue]
I am bumping into this requirement again and think I am going to have to start putting a special metadata/aggregation record into each of the marqo indexes as a workaround. Probably going to need to instroduce another persistance layer altogether now that I think about it.
It's a little more complex than the above example because I need to do a groupby group, e.g. source_pdf1 -> docs -> tags: [red, blue] source_pdf2 -> docs -> tags: [blue] source_pdf3 -> docs -> [yellow, blue]
The goal is to count the number of pdfs that have docs with certain tags. Pdfs don't exist anymore, they are just another piece of metadata on the docs, but I hope the use case is clear.
@pandu-k Can't we integrate a separate package, for example - pandas
or polars
(for larger data) which could handle these aggregation calls. These tools are specifically designed for that, so we can maybe send/stream the data from marqo to these tools and perform the aggregations. Is this feasible?
Is your feature request related to a problem? Please describe. There are limited aggregation options in Marqo.
Describe the solution you'd like Min, max, sum, mean of a field. Count of unique values taken on by a field.
For example: the sum of a field across all docs in the index (perhaps with filtering).
Describe alternatives you've considered Doing the analysis in a different database. The downside is that this increase application complexity