khoj-ai / khoj

Your AI second brain. Get answers to your questions, whether they be online or in your own notes. Use online AI models (e.g gpt4) or private, local LLMs (e.g llama3). Self-host locally or use our cloud instance. Access from Obsidian, Emacs, Desktop app, Web or Whatsapp.
https://khoj.dev
GNU Affero General Public License v3.0
12.63k stars 640 forks source link

[FIX] Khoj failed to parse online search result. #809

Closed driverCzn closed 3 months ago

driverCzn commented 3 months ago

Describe the bug

Khoj seems fail to parse online search result. The issue exists both in local installed python package (khoj-assistant==1.13.0) and the official docker version. The main error seems to be at khoj.processor.tools.online_sear online_search.py:145 -> You cannot call this from an async context - use a thread or sync_to_async.. Call to extract_relevant_info in read_webpage_and_extract_content seems fail.

To Reproduce

Steps to reproduce the behavior:

  1. Using khoj with a local Ollama setup
  2. /online what's the time in shanghai now in chat (actually whatever content you put here)

Screenshots

Snipaste_2024-06-10_21-21-25 Snipaste_2024-06-10_21-29-51

Related debug logs from docker:

2024-06-10 21:16:32 [13:16:32.301981] DEBUG    uvicorn.error: < TEXT "/online       protocol.py:1172
2024-06-10 21:16:32                            what's the time in shanghai now" [39                 
2024-06-10 21:16:32                            bytes]                                               
2024-06-10 21:16:32 [13:16:32.309841] DEBUG    uvicorn.error: > TEXT '{"type":      protocol.py:1178
2024-06-10 21:16:32                            "status", "message":                                 
2024-06-10 21:16:32                            "**\\ud83d\\udc40...e":                              
2024-06-10 21:16:32                            "application/json"}' [146 bytes]                     
2024-06-10 21:16:32 [13:16:32.311283] INFO     khoj.processor.tools.online_sear online_search.py:118
2024-06-10 21:16:32                            ch: Inferring web pages to read                      
2024-06-10 21:16:32 [13:16:32.312276] DEBUG    uvicorn.error: > TEXT '{"type":      protocol.py:1178
2024-06-10 21:16:32                            "status", "message":                                 
2024-06-10 21:16:32                            "**\\ud83e\\uddd0...e":                              
2024-06-10 21:16:32                            "application/json"}' [113 bytes]                     
2024-06-10 21:16:32 [13:16:32.315934] WARNING  khoj.processor.conversation.utils:       utils.py:217
2024-06-10 21:16:32                            Fallback to default chat model                       
2024-06-10 21:16:32                            tokenizer: None.                                     
2024-06-10 21:16:32                            Configure tokenizer for unsupported                  
2024-06-10 21:16:32                            model: dolphin-llama3:latest in Khoj                 
2024-06-10 21:16:32                            settings to improve context stuffing.                
2024-06-10 21:16:36 [13:16:36.682990] DEBUG    khoj.routers.helpers: Chat actor:      helpers.py:173
2024-06-10 21:16:36                            Infer webpage urls to read: 4.370                    
2024-06-10 21:16:36                            seconds                                              
2024-06-10 21:16:36 [13:16:36.683871] INFO     khoj.processor.tools.online_sear online_search.py:123
2024-06-10 21:16:36                            ch: Reading web pages at:                            
2024-06-10 21:16:36                            ['https://www.timeanddate.com/wo                     
2024-06-10 21:16:36                            rldclock/China/Shanghai',                            
2024-06-10 21:16:36                            'https://time.is/UTC+8']                             
2024-06-10 21:16:36 [13:16:36.684662] DEBUG    uvicorn.error: > TEXT '{"type":      protocol.py:1178
2024-06-10 21:16:36                            "status", "message":                                 
2024-06-10 21:16:36                            "**\\ud83d\\udcd6...e":                              
2024-06-10 21:16:36                            "application/json"}' [187 bytes]                     
2024-06-10 21:16:37 [13:16:37.304303] DEBUG    khoj.processor.tools.online_search:    helpers.py:173
2024-06-10 21:16:37                            Reading web page at                                  
2024-06-10 21:16:37                            'https://time.is/UTC+8' took: 0.619                  
2024-06-10 21:16:37                            seconds                                              
2024-06-10 21:16:37 [13:16:37.307071] DEBUG    khoj.routers.helpers: Chat actor:      helpers.py:173
2024-06-10 21:16:37                            Extract relevant information from                    
2024-06-10 21:16:37                            data: 0.000 seconds                                  
2024-06-10 21:16:37 [13:16:37.307758] DEBUG    khoj.processor.tools.online_search:    helpers.py:173
2024-06-10 21:16:37                            Extracting relevant information from                 
2024-06-10 21:16:37                            web page at 'https://time.is/UTC+8'                  
2024-06-10 21:16:37                            took: 0.003 seconds                                  
2024-06-10 21:16:37 [13:16:37.308456] ERROR    khoj.processor.tools.online_sear online_search.py:145
2024-06-10 21:16:37                            ch: Failed to read web page at                       
2024-06-10 21:16:37                            'https://time.is/UTC+8' with You                     
2024-06-10 21:16:37                            cannot call this from an async                       
2024-06-10 21:16:37                            context - use a thread or                            
2024-06-10 21:16:37                            sync_to_async.                                       
2024-06-10 21:16:37 [13:16:37.581144] DEBUG    khoj.processor.tools.online_search:    helpers.py:173
2024-06-10 21:16:37                            Reading web page at                                  
2024-06-10 21:16:37                            'https://www.timeanddate.com/worldcloc               
2024-06-10 21:16:37                            k/China/Shanghai' took: 0.896 seconds                
2024-06-10 21:16:37 [13:16:37.583837] DEBUG    khoj.routers.helpers: Chat actor:      helpers.py:173
2024-06-10 21:16:37                            Extract relevant information from                    
2024-06-10 21:16:37                            data: 0.000 seconds                                  
2024-06-10 21:16:37 [13:16:37.584527] DEBUG    khoj.processor.tools.online_search:    helpers.py:173
2024-06-10 21:16:37                            Extracting relevant information from                 
2024-06-10 21:16:37                            web page at                                          
2024-06-10 21:16:37                            'https://www.timeanddate.com/worldcloc               
2024-06-10 21:16:37                            k/China/Shanghai' took: 0.002 seconds                
2024-06-10 21:16:37 [13:16:37.585210] ERROR    khoj.processor.tools.online_sear online_search.py:145
2024-06-10 21:16:37                            ch: Failed to read web page at                       
2024-06-10 21:16:37                            'https://www.timeanddate.com/wor                     
2024-06-10 21:16:37                            ldclock/China/Shanghai' with You                     
2024-06-10 21:16:37                            cannot call this from an async                       
2024-06-10 21:16:37                            context - use a thread or                            
2024-06-10 21:16:37                            sync_to_async.                                       
2024-06-10 21:16:37 [13:16:37.586005] DEBUG    uvicorn.error: > TEXT '{"type":      protocol.py:1178
2024-06-10 21:16:37                            "status", "message":                                 
2024-06-10 21:16:37                            "**\\ud83d\\udcda...e":                              
2024-06-10 21:16:37                            "application/json"}' [104 bytes]                     
2024-06-10 21:16:37 [13:16:37.586734] DEBUG    uvicorn.error: > TEXT '{"type":      protocol.py:1178
2024-06-10 21:16:37                            "status", "message":                                 
2024-06-10 21:16:37                            "**\\ud83d\\udcad...e":                              
2024-06-10 21:16:37                            "application/json"}' [121 bytes]                     
2024-06-10 21:16:37 [13:16:37.587563] DEBUG    khoj.routers.helpers: Conversation     helpers.py:611
2024-06-10 21:16:37                            Types: [<ConversationCommand.Webpage:                
2024-06-10 21:16:37                            'webpage'>]                                          
2024-06-10 21:16:37 [13:16:37.591748] WARNING  khoj.processor.conversation.utils:       utils.py:217
2024-06-10 21:16:37                            Fallback to default chat model                       
2024-06-10 21:16:37                            tokenizer: None.                                     
2024-06-10 21:16:37                            Configure tokenizer for unsupported                  
2024-06-10 21:16:37                            model: llama3 in Khoj settings to                    
2024-06-10 21:16:37                            improve context stuffing.                            
2024-06-10 21:16:37 [13:16:37.593112] DEBUG    khoj.processor.conversation.openai.gpt:    gpt.py:174
2024-06-10 21:16:37                            Conversation Context for GPT: what's the             
2024-06-10 21:16:37                            time in shanghai now                                 
2024-06-10 21:16:37                            ...                                                  
2024-06-10 21:16:37                            Use this up-to-date information from the             
2024-06-10 21:16:37                            internet to inform your respo...                     
2024-06-10 21:16:37                            According to my knowledge, Shanghai is               
2024-06-10 21:16:37                            currently in China Standard Tim...                   
2024-06-10 21:16:37                            You are Khoj, a smart, inquisitive and               
2024-06-10 21:16:37                            helpful personal assistant.                          
2024-06-10 21:16:37                            Use...

Platform

If self-hosted

Additional Context

The official cloud version seems have no such issue. It would be great if you could fix this as the online search is currently not working for a self-hosted setup.

driverCzn commented 3 months ago

Docker images info:

REPOSITORY             TAG       IMAGE ID       CREATED        SIZE
ghcr.io/khoj-ai/khoj   latest    651009d56280   6 days ago     6.67GB
ankane/pgvector        latest    f2c967e41f72   8 months ago   440MB
sabaimran commented 3 months ago

Ah, @MythicalCow , this is because of that prefetch_related bug in this method that you're fixing. Should be covered once your PR is merged!

sabaimran commented 3 months ago

Thanks for the detailed issue, @driverCzn !

MythicalCow commented 3 months ago

Yup, this should be fixed soon!

sabaimran commented 3 months ago

Fix is merged, will be included in the next release!

cryptocj commented 2 months ago
image

I am runnning 0.14.0 version, got this error when generating images, is the issue related? @sabaimran