khoj-ai / khoj

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (e.g gpt, claude, gemini, llama, qwen, mistral).
https://khoj.dev
GNU Affero General Public License v3.0
14.15k stars 704 forks source link

[IDEA] Improve support for GitHub integration #688

Closed sabaimran closed 7 months ago

sabaimran commented 7 months ago

Describe the feature you'd like

Got a question and a suggestion for integrations in general I guess. Can we get a checkmark or something on the repos to indicate they were successfully indexed? The only way I know to tell is when running locally to look at the servers output so idk how to tell at all from the UI... I know it says save successful, but not sure if that means the index was a success or just if it saved the repo info successfully...

And for the question, does the GH integration reindex when changes are made to the branch?

Ability to add multiple PAT tokens would be nice as well... Maybe name them so it can be specified on the repo config which one to use.

From this Discord discussion.

letto4135 commented 7 months ago

Hey @sabaimran, continuing from discord.

Just for context I'm running locally on an apple silicon Mac installed with pip.

I'm surprised you said there isn't much usage of this integration, I wonder if most people just index the repos locally? I'm not entirely sure that the indexing local folders works though so I don't like it myself. If I add a folder from the desktop app nothing happens on the server logs when I click save so I've got no way of knowing if it is actually indexing anything I asked it to. And checking the "Files" in settings on the web ui only shows documents from obsidian so it makes me think it didn't index correctly.

I have 2 gh accounts one personal and one business so I'd like to be able to set up PATs for both and I'm not sure what would happen if I removed one PAT to index repos in the other account if it would effect the repos at some point from the PAT I removed...

The gh integration seems to only allow one repo add at a time, if you try and add multiple and then save it thows errors in the server logs, but still acts like it saved them all in the UI with the "save successful" message, which is why I suggested a check mark or something to say yes they've been indexed, because you really can't tell it failed from the UI.

KeyError: 'tree'
[11:38:05.870103] ERROR    🚨 Failed to update server via API: Failed to update content index                                                    api.py:188
                           ╭──────────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────────╮
                           │ /opt/homebrew/lib/python3.11/site-packages/khoj/routers/api.py:185 in update                                      │
                           │                                                                                                                   │
                           │   182 │   │   logger.warning(error_msg)                                                                           │
                           │   183 │   │   raise HTTPException(status_code=500, detail=error_msg)                                              │
                           │   184 │   try:                                                                                                    │
                           │ ❱ 185 │   │   initialize_content(regenerate=force, search_type=t, init=False, user=user)                          │
                           │   186 │   except Exception as e:                                                                                  │
                           │   187 │   │   error_msg = f"🚨 Failed to update server via API: {e}"                                              │
                           │   188 │   │   logger.error(error_msg, exc_info=True)                                                              │
                           │                                                                                                                   │
                           │ /opt/homebrew/lib/python3.11/site-packages/khoj/configure.py:265 in initialize_content                            │
                           │                                                                                                                   │
                           │   262 │   │   │   │   if not status:                                                                              │
                           │   263 │   │   │   │   │   raise RuntimeError("Failed to update content index")                                    │
                           │   264 │   │   except Exception as e:                                                                              │
                           │ ❱ 265 │   │   │   raise e                                                                                         │
                           │   266                                                                                                             │
                           │   267                                                                                                             │
                           │   268 def configure_routes(app):                                                                                  │
                           │                                                                                                                   │
                           │ /opt/homebrew/lib/python3.11/site-packages/khoj/configure.py:263 in initialize_content                            │
                           │                                                                                                                   │
                           │   260 │   │   │   │   │   user=user,                                                                              │
                           │   261 │   │   │   │   )                                                                                           │
                           │   262 │   │   │   │   if not status:                                                                              │
                           │ ❱ 263 │   │   │   │   │   raise RuntimeError("Failed to update content index")                                    │
                           │   264 │   │   except Exception as e:                                                                              │
                           │   265 │   │   │   raise e                                                                                         │
                           │   266                                                                                                             │
                           ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
                           RuntimeError: Failed to update content index
[11:38:05.897153] INFO     127.0.0.1:62902 - "GET /api/update?t=github HTTP/1.1" 500

And on that, being able to index entire orgs or entire users all at once would be nice. You can find out all the repos and the default branch of each easy enough with the gh cli. It could potentially create an entry for every repo it finds under a user/org with the info filled in so you can adjust it if needed.

A "Reindex" button, or auto reindex on a schedule or both would be nice as well. I don't have that many repos so telling it to reindex once a week or something would be fine, but if someone has a lot of repos they might not want to do that and opt to reindex manually when needed.

sabaimran commented 7 months ago

@letto4135 what type of files are you trying to index? It's worth mentioning that currently, only plaintext files and PDFs are supported from the desktop application. So, you're .txt, .md, etc files should be picked up.

Could you describe what sort of chatting/interactions you're hoping to do? Would you want to chat with documentation, or the underlying code itself?

Let me try out the integration again and let you know if I find bugs/repro. If the repo you're indexing is public, send me a link and I'll go ahead and try that directly?

letto4135 commented 7 months ago

I'm indexing the code instead of using the gh integration, the thought was if I couldn't get GH integration to work easily that I would index the folder where the code is instead, so for me I've got

ls ~/gh
- ~/gh/<repo1>
- ~/gh/<repo2>
- etc...

So maybe it just doesn't index because it doesn't go into sub folders to look for things to index? I suppose it wouldn't index much anyway, just the readmes from what you're saying, but its not doing that either.

Could you describe what sort of chatting/interactions you're hoping to do? Would you want to chat with documentation, or the underlying code itself?

I'd like it to index the code so it has context over everything when I ask a question. Kind of like copilot and JetBrains AI, but potentially better because it can have context over multiple repos that work together in a system instead of only the one open currently.

debanjum commented 7 months ago

Hey @letto4135, once #692 is merged, Khoj will be able to index all text files, including your code files. This should work both when indexing local folders or Github repositories.

So maybe it just doesn't index because it doesn't go into sub folders to look for things to index?

The desktop app should index recursively down, so should include subfolders as well if you index ~/gh.

letto4135 commented 7 months ago

@debanjum Gorgeous! 🙇