okfn-brasil / querido-diario-api

This is Querido Diário's API. It provides everything the frontend does and even more!
https://queridodiario.ok.org.br/api/docs
MIT License
48 stars 32 forks source link

Error when using fragments parameters #30

Closed jvanz closed 3 years ago

jvanz commented 3 years ago

After the changes from PR #25, we can see the following error when performing a search:

May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]: INFO:     161.35.112.113:34550 - "GET /api/gazettes/?since=2021-01-01&size=10&fragment_size=0&number_of_fragments=0&pre_tags=&post_tags= HTTP/1.0" 500 Internal Server Error         
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]: ERROR:    Exception in ASGI application                                                                                                                                              
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]: Traceback (most recent call last):                                                                                                                                                   
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 390, in run_asgi                                                                      
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     result = await app(self.scope, self.receive, self.send)                                                                                                                          
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__                                                                            
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     return await self.app(scope, receive, send)                                                                                                                             [32/9647]
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 179, in __call__                                                                                       
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     await super().__call__(scope, receive, send)                                                                                                                                     
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 111, in __call__                                                                                     
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     await self.middleware_stack(scope, receive, send)                                                                                                                                
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__                                                                                
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     raise exc from None                                                                                                                                                              
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__                                                                                
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     await self.app(scope, receive, _send)                                                                                                                                            
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__                                                                                        
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     raise exc from None                                                                                                                                                              
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__                                                                                        
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     await self.app(scope, receive, sender)                                                                                                                                           
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 566, in __call__                                                                                          
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     await route.handle(scope, receive, send)                                                                                                                                         
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 227, in handle                                                                                            
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     await self.app(scope, receive, send)                                                                                                                                             
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 41, in app                                                                                                
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     response = await func(request)                                                                                                                                                   
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 182, in app                                                                                                 
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     raw_response = await run_endpoint_function(                                                                                                                                      
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 133, in run_endpoint_function                                                                               
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     return await dependant.call(**values)                                                                                                                                            
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/mnt/code/api/api.py", line 122, in get_gazettes                                                                                                                             
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     return trigger_gazettes_search(None, since, until, keywords, offset, size, fragment_size, number_of_fragments, pre_tags, post_tags)                                              
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/mnt/code/api/api.py", line 45, in trigger_gazettes_search                                                                                                                   
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     gazettes_count, gazettes = app.gazettes.get_gazettes(                                                                                                                            
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/mnt/code/gazettes/gazette_access.py", line 89, in get_gazettes                                                                                                              
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     total_number_gazettes, gazettes = self._data_gateway.get_gazettes(                                                                                                               
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/mnt/code/database/elasticsearch.py", line 140, in get_gazettes                                                                                                              
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     gazettes = self._es.search(body=query, index=self._index)                                                                                                                        
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/elasticsearch/client/utils.py", line 152, in _wrapped                                                                                 
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     return func(*args, params=params, headers=headers, **kwargs)                                                                                                                     
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/elasticsearch/client/__init__.py", line 1612, in search
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     return self.transport.perform_request(
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/elasticsearch/transport.py", line 392, in perform_request
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     raise e
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/elasticsearch/transport.py", line 358, in perform_request
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     status, headers_response, data = connection.perform_request(
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/elasticsearch/connection/http_urllib3.py", line 269, in perform_request
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     self._raise_error(response.status, raw_data)
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:   File "/usr/local/lib/python3.8/site-packages/elasticsearch/connection/base.py", line 300, in _raise_error
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]:     raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
May 02 21:29:33 staging.jarbas.serenata.ai docker[7581]: elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'The length of [source_text] field of [5f01bea37b7279f34d951a912115c1dd] doc of [queridodiario] index has exceeded [1000000] - maximum allowed to be analyzed for highlighting. This maximum can be set by changing the [index.highlight.max_analyzed_offset] index level setting. For large texts, indexing with offsets or term vectors is recommended!')

Please, check the ES documentation to find a workaround for this. Something to avoid the error, even if we do not return the fragment to the user. We can improve the index later to improve this even better. For now, let's fix the error.

jvanz commented 3 years ago

I think I already known how to fix this. No change in the API code is necessary. We just need to change the index configuration. I'll do it soon (I hope tomorrow)

jvanz commented 3 years ago

To fix this issue was not necesary any change in the code. In the past days I've reindexed the documents with a different index_options to allow the highlight API to work.