ubiquity-os / plugins-wishlist

0 stars 2 forks source link

Pull Precheck #45

Open 0x4007 opened 1 week ago

0x4007 commented 1 week ago

We have waves of contributors that open pulls sometimes keeping our review team busy.

We can save valuable man hours by having the bot preemptively check if the pull achieves the specification.

Pull Flow

  1. Assuming that the contributor follows directions, the pull must be initially opened up as a draft.
  2. When it is ready for review, they should turn it into a finalized pull request.
  3. The bot should consume the issue specification and then the pull diff into its context.
  4. The bot should return actionable feedback for what is missing from the specification. It should leave the review state as requested changes. It should also convert the pull back into a draft. However if it passes, it should leave as just "commented" (not approval).
  5. If a collaborator (reviewer on the team) converts the pull back from a draft into a finalized pull, then the bot should back off and stop intervening. Basically the inspection should only occur on pull.created and when it's converted from draft to finalized by the pull author.

Bonus but maybe can be handled in other tasks:

  1. Ensure CI is passing. I didn't include this here because sometimes there are problems out of the control of the pull author.
  2. Limits: some novice contributors might abuse the code review feature and burn out credits by continuously requesting reviews every time they make a tiny change that doesn't achieve the specification. We have seen folks do hundreds of commits for tasks that required a small amount of lines of code. We can limit the reviews to one per day for ChatGPT.

We should use o1 but if it's not available in the API, either 4o or 3.5 sonnet should do.

ubadineke commented 6 days ago

/start

ubiquity-os[bot] commented 6 days ago
! Please set your wallet address with the /wallet command first and try again.
ubiquity-os[bot] commented 6 days ago
! No wallet address found
ubadineke commented 6 days ago

/wallet 0x3Ea855E4D6440D937117c776501e7653a770b759

ubiquity-os[bot] commented 6 days ago

+ Successfully registered wallet address
ubadineke commented 6 days ago

/start

ubiquity-os[bot] commented 6 days ago
DeadlineWed, Oct 2, 9:41 PM UTC
Beneficiary 0x3Ea855E4D6440D937117c776501e7653a770b759
Tips:
<ul>
<li>Use <code>/wallet 0x0000...0000</code> if you want to update your registered payment wallet address.</li>
<li>Be sure to open a draft pull request as soon as possible to communicate updates on your progress.</li>
<li>Be sure to provide timely updates to us when requested, or you will be automatically unassigned from the task.</li>
<ul>
ubadineke commented 5 days ago

@Keyrxng should I implement this feature in a new file? and also what should happen if the pull is not initially opened as a draft?

Keyrxng commented 5 days ago

@0x4007 I had gpt re-rewrite for brevity but it removed what I'd call context but this still makes sense.

Discussion:

Spec consideration > Embedding effectiveness > Bot intent & current model

https://chatgpt.com/share/66ec086f-7d70-8000-a914-991b77a819b0

Conclusion:
The current model (embedding-based) struggles to provide detailed, code-specific feedback. While embeddings are useful for high-level semantic comparisons, they lack the precision needed for effective PR reviews. You'll need additional tools to meet your goal.


Task Understanding:

This is very similar to /review, and separately, /gpt currently has the ability to process full PR diffs, spec, and conversation per query across multiple issues/repos. However, embeddings aren't ideal for this. Raw text analysis (task spec + diff) is more effective since we need precise feedback, not high-level similarity or semantic meaning.

The plugin's effectiveness must be high to avoid becoming annoying or useless. Therefore, it makes sense to re-spec or close this idea since embeddings aren’t suitable.


To be clear, the intention of this spec was to automate the initial review process and is feasible but better done without embeddings in my opinion. If the /review approach is being replaced by this then I think the following workflow works best:

Suggested Workflow:

Additionally the task is way overpriced, https://github.com/ubiquity/ubiquibot/issues/746 is the /review command I authored and was priced at $150 and the /gpt command is priced at $200 lmaoo, and I spent days/weeks on these. $400 max, we are simply feeding two bodies of text to GPT with a custom prompt to have it perform a review with a couple of conditionals afterwards to add a comment and update a review status.


should I implement this feature in a new file? and also what should happen if the pull is not initially opened as a draft?

@ubadineke I'm unsure personally at this point as I did not write this task specification. I have left comments and once @0x4007 replies I'll have a much better idea of how this task should be implemented.

  1. Yes this feature will span multiple new files, try to keep things clean and tidy.
  2. Personally I'd account for this and just run a review on pull_request.created which will allow it through or kick it back into draft just like 0x4007 said. Future reviews would be tied to pull_request.ready_for_review distanced by at least 24hrs between each using the timeline or review comments.
ubadineke commented 5 days ago

@Keyrxng @0x4007 What if the PR is created and the related issue is not immediately mentioned in the body. We just move it back to draft right and tell the user to edit the PR and specify issue?

The existing event payload style doesn't have a field for PR Number, maybe it was just designed for comments. Can modifications be made? If the issue must be specified from the payload then the first paragraph in my comment is redundant.

Keyrxng commented 5 days ago

I guess that's a good idea as we like to strongly enforce the linked issue although it's not a guarantee each PR has or needs this such as quick fixes for example.

I'd say don't do anything, our pull request template enforces this format and legit cases exist where it is not used.


The existing event payload style doesn't have a field for PR Number, maybe it was just designed for comments. Can modifications be made? If the issue must be specified from the payload then the first paragraph in my comment is redundant.

I'm not sure I understand but issues and PRs are both considered as an issue by the GitHub api, there is no PR number only issue.number. You can get linked issues via the GraphQL API most reliably.

If a PR does not have an associated task specification then without having codebase embeddings to compare the diff against we don't have a lot of context other than what they have changed via the diff. Which can still be reviewed by GPT on it's own for logic errors and optimizations it'll just be more general review without the guard rail of the spec.

0x4007 commented 5 days ago

I think it would be more appropriate for you to extend your /gpt command logic. You can handle it in a separate pull. Over on your old pull I said to merge and let's test in production. From there you can extend the logic as part of this task.

This task makes it more seamlessly integrated (but looks like your previous work set a lot of relevant ground work)

This requires more deeper integration into GitHub review workflow such as by setting review states and switching between draft and non draft etc.

I have extra time because the prompt engineering alone I'm sure will take a couple days to get the results we want.

Yes let's keep this plaintext no embeddings needed.

ubiquity-os[bot] commented 5 days ago

@Keyrxng the deadline is at Fri, Oct 4, 2:47 AM UTC

ubadineke commented 5 days ago

/start

ubiquity-os[bot] commented 5 days ago
! This issue is already assigned. Please choose another unassigned task.
ubadineke commented 3 days ago

Hey @0x4007,

I noticed Keyrxng might be busy, so I went ahead and took on the task. I’ve put together an initial version of the implementation, which you can check out here.

Right now, it only triggers on pull_request.created, and I still need to add the logic for pull_request.ready_for_review. I also extended the GPT logic like you mentioned before.

I’d really like to see this through to the end! If you could create a repo for this plugin, I can make a PR and keep working on it and then we can make adjustments to the prompting.

Thanks!

ubiquity-os[bot] commented 3 days ago

@ubadineke, @Keyrxng the deadline is at Sat, Sep 28, 8:00 AM UTC

0x4007 commented 3 days ago

@Keyrxng you should be able to make the repo I'm not on computer

Keyrxng commented 3 days ago

I think it would be more appropriate for you to extend your /gpt command logic

If that's the case then no new repo is required.

I noticed Keyrxng might be busy, so I went ahead and took on the task.

Not typically how things go around here, if you are unassigned from a task you shouldn't technically be allowed to work it again.

I’ve put together an initial version of the implementation, which you can check out here.

Your implementation seems to be more or less a copy/paste of my super early /review command which if we are extending /gpt which seems like a better idea as it has much more reach with context although it will require guidance with context in terms of review etc.

I think it's best that I resolve this task solo today by refactoring/upgrading /gpt, apologizes @ubadineke I appreciate your enthusiasm tho.


@0x4007 you said it should reply to @ubiquityos instead of using the /gpt or any kind of slash command invocation?

Does that mean that you would invoke a review or review should be automate as discussed in this spec? My /gpt QA showed me asking for it to perform a review but instead this new upgraded functionality should be as per this spec, reviews built-in and triggered on events.

Ideally we keep within /gpt but have it now respond to a direct mention. We keep it within because of the context handling that it has, we can easily customize it to allow only x-depth of referenced context through instead of the unlimited depth that /gpt was made to have. As opposed to rebuilding another custom context collection system etc.

0x4007 commented 3 days ago

I think unlimited depth is interesting to experiment with. I always side on providing more context to llms

Let's not use that slash command and instead replace it to respond to its @UbiquityOS tag

reviews built-in and triggered on events.

Sounds good