huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections
http://www.uwazi.io
MIT License
236 stars 79 forks source link

Search endpoint doesn't allow to filter by common properties #7008

Closed mfacar closed 2 months ago

mfacar commented 2 months ago

Currently the filter parameter in /api/search is only parsing against the metadata properties, if a common property is specified as a filter it is not included in the final query to ElasticSearch.

This functionality is needed for filter data by the property Date Added in the page Víctimas Directas as part of #150 with 26/07/2024 as the due date.

It needs to be analyzed if there are other needs for filters like filtering by filenames.

cc @aphilop, @konzz, @txau, @salvalacruz

salvalacruz commented 2 months ago

Thank you @mfacar , please let me know if we found an straight and easy solution, otherwise we take a workaround to solve https://github.com/huridocs/Internal-Issues/issues/150

RafaPolit commented 2 months ago

@mfacar actually, this is supported through query strings.

If you type in the search box in the library: creationDate:(>=19192837129387 AND <=19192837140000) as I understand it, you will get what you need.

For this approach, you just need to create a way to transform dates to timestamps and that's it?

Here's the ES documentation for this patterns: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

I believe the ones that use the square brackets are not necessarily working, but the ones using parenthesis are.

(@daneryl is the one that found out all this, I'm just writing down the reply)

txau commented 2 months ago

@RafaPolit I think we used this in the past but is not well documented anywhere. what other query strings are supported? ig. I would like to be able to search entities that have a particular filename as main file or attachment.

RafaPolit commented 2 months ago

That is a much harder search, as Files are not part of the Entity, nor are they denormalized in ES (this I would need to check, but I'm 80% sure). So that would involve actually denormalizing that data into the entities (which makes sense to do). Everything else should actually depend more on Search V2, but Search V2 never got the aggregations implemented into the pipeline, and this particular request makes use of Aggregations and that is why they are relying on V1.

In theory, you can pass any string query to the search terms and it will work. We do scape some characters so some particular implementations may fail. The search tips gives a small insight into this, but for sure there is a lot more possibilities with search queries that are not documented. We should create better documentation for the search, for sure.

RafaPolit commented 2 months ago

I stand corrected. We are indeed indexing the documents, so something like this should work: documents.originalname:"some name"

... but it doesn't. Most other properties I can search into (those that are indexed, not all are), but originalname is failing to produce matches. Not sure what I'm doing wrong, or if we are escaping quotes or some other problems, but it's almost working.

mfacar commented 2 months ago

I worked perfectly for IDHUCA, thanks @RafaPolit @daneryl !

cc. @salvalacruz

txau commented 2 months ago

@mfacar @RafaPolit my request is more secondary wrt what @mfacar was asking for. Since this seems to be working I am closing this for now.