plone / guillotina

Python AsyncIO data API to manage billions of resources
https://guillotina.readthedocs.io/en/latest/
Other
187 stars 50 forks source link

Searching by "SearchableText" is broken #1151

Closed frapell closed 2 years ago

frapell commented 2 years ago

I've been trying to figure out how to fix this, and I can't quite make it... Searching by an attribute, like the title, works just fine

$ curl -i -X GET "http://127.0.0.1:8080/db/guillotina/@search?title__in=News" --user root:root
HTTP/1.1 200 OK
date: Tue, 12 Oct 2021 20:37:59 GMT
server: uvicorn
Content-Type: application/json
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: *
Server: Guillotina/6.3.16.dev0
Transfer-Encoding: chunked

{"items":[{"id":"news","tid":null,"path":"/news","tags":null,"uuid":"6d2|78e1d0427f064c21abcdf63b9b7f0b92","depth":2,"title":"News","creators":["root"],"type_name":"Item","parent_uuid":"6d2923902b3d41e2af9487a336e4c705","access_roles":["guillotina.Reader","guillotina.Reviewer","guillotina.Owner","guillotina.Editor","guillotina.ContainerAdmin"],"access_users":["root"],"container_id":"guillotina","contributors":["root"],"creation_date":"2021-10-12T20:35:05.537940+00:00","modification_date":"2021-10-12T20:35:05.537940+00:00","@name":"news","@uid":"6d2|78e1d0427f064c21abcdf63b9b7f0b92","@id":"http://127.0.0.1:8080/db/guillotina/news"}],"items_total":1}

However searching by SearchableText fails

$ curl -i -X GET "http://127.0.0.1:8080/db/guillotina/@search?SearchableText=News" --user root:root
HTTP/1.1 500 Internal Server Error
date: Tue, 12 Oct 2021 20:38:07 GMT
server: uvicorn
Content-Type: application/json
Transfer-Encoding: chunked

With the following Traceback

Traceback (most recent call last):
  File "/trabajo/plone/guillotina/guillotina/middlewares/errors.py", line 30, in __call__
    resp = await self.next_app(scope, receive, _send)
  File "/trabajo/plone/guillotina/guillotina/asgi.py", line 167, in __call__
    return await self.request_handler(request)
  File "/trabajo/plone/guillotina/guillotina/asgi.py", line 190, in request_handler
    resp = await route.handler(request)
  File "/trabajo/plone/guillotina/guillotina/traversal.py", line 265, in handler
    view_result = await self.view()
  File "/trabajo/plone/guillotina/guillotina/api/service.py", line 191, in _call_validate
    return await self._call_original()
  File "/trabajo/plone/guillotina/guillotina/api/service.py", line 197, in _call_original_func
    return await self.__original__(self.context, self.request)
  File "/trabajo/plone/guillotina/guillotina/api/search.py", line 98, in search_get
    return await search.search(context, dict(request.query))
  File "/trabajo/plone/guillotina/guillotina/contrib/catalog/pg/utility.py", line 291, in search
    parsed_query = parse_query(context, query, self)
  File "/trabajo/plone/guillotina/guillotina/catalog/utils.py", line 100, in parse_query
    return parser(query)
  File "/trabajo/plone/guillotina/guillotina/contrib/catalog/pg/parser.py", line 149, in __call__
    result = self.process_queried_field(field, value)
  File "/trabajo/plone/guillotina/guillotina/contrib/catalog/pg/parser.py", line 51, in process_queried_field
    return self.process_compound_field(field, value, " OR ")
  File "/trabajo/plone/guillotina/guillotina/contrib/catalog/pg/parser.py", line 31, in process_compound_field
    parsed_value = urllib.parse.parse_qsl(urllib.parse.unquote(value))
  File "/usr/lib/python3.8/urllib/parse.py", line 637, in unquote
    string.split
AttributeError: 'dict' object has no attribute 'split'

I believe this is related to this PR https://github.com/plone/guillotina/pull/1132 however trying guillotina from before that change also fails, but with a different Traceback

Traceback (most recent call last):
  File "/trabajo/plone/guillotina/guillotina/middlewares/errors.py", line 30, in __call__
    resp = await self.next_app(scope, receive, _send)
  File "/trabajo/plone/guillotina/guillotina/asgi.py", line 167, in __call__
    return await self.request_handler(request)
  File "/trabajo/plone/guillotina/guillotina/asgi.py", line 190, in request_handler
    resp = await route.handler(request)
  File "/trabajo/plone/guillotina/guillotina/traversal.py", line 265, in handler
    view_result = await self.view()
  File "/trabajo/plone/guillotina/guillotina/api/service.py", line 191, in _call_validate
    return await self._call_original()
  File "/trabajo/plone/guillotina/guillotina/api/service.py", line 197, in _call_original_func
    return await self.__original__(self.context, self.request)
  File "/trabajo/plone/guillotina/guillotina/api/search.py", line 98, in search_get
    return await search.search(context, dict(request.query))
  File "/trabajo/plone/guillotina/guillotina/contrib/catalog/pg/utility.py", line 284, in search
    parsed_query = parse_query(context, query, self)
  File "/trabajo/plone/guillotina/guillotina/catalog/utils.py", line 100, in parse_query
    return parser(query)
  File "/trabajo/plone/guillotina/guillotina/contrib/catalog/pg/parser.py", line 141, in __call__
    sql, values, select, field = result
ValueError: not enough values to unpack (expected 4, got 3)

If I place a breakpoint in https://github.com/plone/guillotina/blob/master/guillotina/contrib/catalog/pg/parser.py#L50 I see

(Pdb) field,value
('title__in', 'News')

When searching using the title, however, when trying with SearchableText I see

(Pdb) field,value
('searchabletext__or', {'title__in': 'News'})

When using https://github.com/plone/guillotina-volto this looks like this

(Pdb) field,value
('searchabletext__or', {'text__in': 'News', 'title__in': 'News', 'url__in': 'News'})

And value being a dict happens in https://github.com/plone/guillotina/blob/master/guillotina/catalog/parser.py#L25

As you might have guessed, I am trying to fix the Search when using Volto + guillotina-volto, which uses SearchableText.

Any advice on where to fix this? I believe improving the process_compound_field method should be the way to go, what do you think? https://github.com/plone/guillotina/blob/fb562308fb62e991560a8e6527353664ae3070ac/guillotina/contrib/catalog/pg/parser.py#L30-L44

frapell commented 2 years ago

In trying to understand the changes of PR #1132 I don't think that's how the __or is intended to be used? According to the test, it seems the query is expected to be /db/guillotina/@search?__or=title%3DFirst item%26title%3DSecond item However, I think it should be /db/guillotina/@search?title__or=First item,Second item @bloodbare what do you think?

masipcat commented 2 years ago

Any advice on where to fix this? I believe improving the process_compound_field method should be the way to go, what do you think?

I think this is the right place. Can you try if changing the L31-33 with this code solves the problem?

if isinstance(value, dict):
    parsed_value = value.items()
else:
    parsed_value = urllib.parse.parse_qsl(urllib.parse.unquote(value))
    if not isinstance(parsed_value, list):
        return None
frapell commented 2 years ago

@masipcat Yup, that works like a charm!

masipcat commented 2 years ago

@masipcat Yup, that works like a charm!

Nice :) do you want to open a PR with this changes and if you don't mind, add a test for this case?

frapell commented 2 years ago

@masipcat Sure... How can I test against postgres? should I edit the TESTING_SETTINGS manually?

masipcat commented 2 years ago

Just set the env var DATABAES=postgres:

$ DATABASE=postgres pytest guillotina/tests
masipcat commented 2 years ago

you need to have docker installed and running. A pytest fixture will create the postgres container automatically