sturdy-dev / codereview.gpt

Reviews your Pull/Merge Requests using ChatGPT
MIT License
550 stars 68 forks source link

AMAZING! can we get more than just Pull Requests in here ??? #2

Open fire17 opened 1 year ago

fire17 commented 1 year ago

Hi there! Just found your extension! Looks incredible ! was wandering if you could share how you managed to pull off gathering data from the pull request and feeding it into chatgpt

I wish to do similar thing but for an entire repo, or just any webpage I assume that somewhere you have a pre-prompt which says something along the lines of.. "The following are the changes from a pull request, please review it: "

So for general sites you would have some like "The following is the contents of a webpage, get all the information you can from it as i will be asking you questions about it shortly"

For a repo i would like "Based on the Readme of this repo, write a full project that answers the authors wishes. Pick the best technology for the use case. Give me all the files I will need (and their contents) for to fill this repo. The repo should be easily deployable. Generate all the necessary test. etc

Would love to know what you think Thanks a lot and have an amazing day!

krlvi commented 1 year ago

On your question about getting code changes from a PR, check this code: https://github.com/sturdy-dev/codereview.gpt/blob/a2acd3b7b17933c3ac553e54c4deb3db709201b0/src/popup.js#L140 You can also see the prompt here: https://github.com/sturdy-dev/codereview.gpt/blob/a2acd3b7b17933c3ac553e54c4deb3db709201b0/src/popup.js#L142

This plugin was inspired by https://github.com/clmnin/summarize.site https://github.com/sturdy-dev/codereview.gpt/blob/a2acd3b7b17933c3ac553e54c4deb3db709201b0/README.md?plain=1#L112 that @clmnin made, which does something similar to what you described.

Hope this helps

fire17 commented 1 year ago

Thanks a lot for the suggestions! I'd like to keep asking, hope it's ok Can you @krlvi (or @clmnin) look into helping me do this for an entire repo-worth of content ?

if I understand you correctly , you're getting the contents of the diff file between the PR and the branch and running it in chatgpt with some sort of pre/post prompt

I wonder how to squeeze an entire repo into a file, preferably straight from/on the web my current setup -> I've made a tool called Repo-Gist which walks and gathers all of the contents of all the files in a folder tree , and puts all of it in one big gist, md file. I give headers and proper separation between each file, and also include a tree output in there. I run repo-gist and copy over this big text into chatgpt (the preprompt is already in the tool) Now I can ask chatgpt questions based on this repo - (needs more experimentations, but it's one of the holy grails for me)

I wish to be able to do this on just any repo link (without needing to clone, and use this tool) do you have any idea on how to get a repo and squeeze it in a better way? maybe take all of the commits and patch them ? sounds too much, hope you have better ideas

Let me know what you think, Thank you very much and have a good one!

krlvi commented 1 year ago

yeah, you can make this work. For example, you can get the files in a repo like this: https://api.github.com/repos/fire17/repo-gist/git/trees/main?recursive=1 and get file contents like this https://api.github.com/repos/fire17/repo-gist/contents/repoGist.py (base64 encoded) In practice it's likely that this is more text than the gpt prompt size limit. Some clever things one can try:

fire17 commented 1 year ago

Thanks a lot for all the suggestions and materials! i've got some research to do...

regarding

yeah, you can make this work. For example, you can get the files in a repo like this: https://api.github.com/repos/fire17/repo-gist/git/trees/main?recursive=1 and get file contents like this https://api.github.com/repos/fire17/repo-gist/contents/repoGist.py (base64 encoded)

The recursive=1 link looks really useful, and just now i understood how the second one contains the the contents. Thanks a lot!

There are some clever techniques shown in this post https://thakkarparth007.github.io/copilot-explorer/posts/copilot-internals.html

+1 !

  • Extract only interfaces from the code (e.g. public function definitions)

I feel like for some very high level requests, you would need to have the actual contnets.

For example, a golden request of mine is: "My codebase is a complete mess. Restructure my classes, functions, threads calls, and the flow of my code in general, so that it will be much more organized, simpler to debug, have clear dependencies with no overlapings, and just more professional overall. Only fragmentize, do not change any of the actual implementations. You are allowed to rename things. ~~Push all the necessary changes to a new branch, explain all the changes in detail in the commit message~~ Show me all the changes that i will need to make, and the new tree folder structure of the project. Explain what was before, and what was changed, how, and why."

But you can maybe get other real cool stuff with just the interfaces and params. For example: "Give me full docs and API based on the current state of my project. Make sure the docs are a library, with different pages for every secion. Include an intro, how to install, how to use, frequent Q&A, use cases and examples, and everything else proper and professional project docs should have. First give me the tree list of all the pages (nested into sections), and then All the pages one by one. Use MD files, the clear and beautiful modern style and esthetics"

In practice it's likely that this is more text than the gpt prompt size limit. Some clever things one can try:

  • Get gpt to summarize files one by one in multiple prompts

ps - more questions, @krlvi Do you have any recommendations in how to use the Playground or API of gpt3.5 and have it produce similar, highest level quality of response, like chatgpt. ChatGPT feels more optimized than just throwing the playground/api version into a chat interface. Can you give me your thought on the case, honestly i need to play with the api or playground more and verify, this assumption was passed to me by a friend. He tried to generate a lot of output which exceeded the output limit (i told him to - continue) but he mentioned that the playground's limits are much higher, only the model isn't as good. I'm was really enjoying chatgpt so i didnt need to go anywhere else, but now that i want to process an entire (potentially huge) repo, maybe it will be more practical there, since with higher limits, your batch count/calls/time to resolve/new repo output will be much better. Maybe just got to find the magic prompt to gpt3 to turn it into a chatgpt haha ?, or maybe the models are different.. Looking for others who are interested, would love it if you can tag anyone you think is relevant @dougmercer <3