ubiquity / ubiquibot

Putting the 'A' in 'DAO'
https://github.com/marketplace/ubiquibot
MIT License
16 stars 59 forks source link

Natural Language Configuration Modifications: E2E Tests. #889

Open 0x4007 opened 7 months ago

0x4007 commented 7 months ago

Make appropriate e2e tests using Jest to ensure reliability.

etc

Keyrxng commented 7 months ago

Is there any sort of docs/whitepaper or anything along those lines that details specifics as to how element scoring works for example?

Can additional elements be added in to the comment elements as per the partner's discretion?

What are the parameters for what can be changed and what shouldn't be changed?

Should the changes be pushed directly into the default_branch or opened as a PR to avoid any mishaps? E2E should cover most cases but committing directly to the working branch seems risky, if there is an issue someone needs to get eyes on anyway, so getting eyes on to approve the review seems like a better UX considering that.

I asked /config I want to refactor the incentive elements scoring between 0 and 5. I want comments which are crafted with care, time and effort to be rewarded for well formatted and fully featured responses.

and it added a whole bunch of new elements, is it a lot more restrictive than that I guess?

Keyrxng commented 7 months ago

What is the preferred structure for tests, I'm assuming just write them into /tests as individual files for each PR covering invocation to execution?

I see that you have your tests for issue in it's own dir, should all handlers that need it be put into their own dir with their respective tests and mocks?

Keyrxng commented 7 months ago

I'm struggling to conceptualize effective tests atm

My thoughts are:

"issueCreatorMultiplier": 3,
    "maxPermitPrice": 1000

P.S: When it comes to NFT permits are they using maxPermitPrice = 1 or are they having their own config object setup?

Logs COMMAND: > /config I want to refactor the incentive elements scoring between 0 and 5. I want comments which are crafted with care, time and effort to be rewarded for well formatted and fully featured responses. It typically enters the inferred key:value incorrect and then after it reads the validation errors (can chain upto 3, 4 times depending on the prompt) it easily resolves them. ```yaml ' "incentive_elements_scoring": "0-5",\n' + ' "reward_for_well_formatted_responses": "false",\n' + ' "reward_for_fully_featured_responses": "false"\n' + ' ' {\n' + ' "instancePath": "",\n' + ' "schemaPath": "#/additionalProperties",\n' + ' "keyword": "additionalProperties",\n' + ' "params": {\n' + ' "additionalProperty": "reward_for_fully_featured_responses"\n' + ' },\n' + ' "message": "must NOT have additional properties"\n' + ' }\n' ```
0x4007 commented 7 months ago

Is there any sort of docs/whitepaper or anything along those lines that details specifics as to how element scoring works for example?

I can share the philosophy behind this.

The idea is that partners can credit comments that are crafted with care. The configuration technically makes this possible to process every comment with granular precision (down to the tag level as you're aware) but it is up to the partner's discretion as to exactly how they are processed and credited. I imagine that we will experiment within Ubiquity and recommend default settings to our partners based on our internal results.

I've noticed that comments written with lists generally are higher quality (i.e. more informative and expressive) than those without. Comments with links as context/evidence and images also generally are significantly more informative/valuable than comments with little-to-no-formatting. This is based off of anecdotal evidence.

That is the inspiration behind this technology. Regarding how it works, you set a price that is credited every time the HTML tag appears in the comment. You can also choose to ignore crediting of specific HTML tags (e.g. blockquotes, why would you get credited for somebody else's contribution?)

Can additional elements be added in to the comment elements as per the partner's discretion?

Yes it is designed to be fully configurable with support for every HTML entity.

What are the parameters for what can be changed and what shouldn't be changed?

We can start simple with some of the major ones. I am unsure off hand but probably makes sense to focus on things that are likely to get changed frequently, or are less ambiguous on what are makes sense for sensible values. I wouldn't know 100% without spending time on the code and experimenting.

pushed directly into the default_branch

If it is a stable functionality (runtime tests can help determine this) then it should push to the default branch.

I asked /config I want to refactor the incentive elements scoring between 0 and 5. I want comments which are crafted with care, time and effort to be rewarded for well formatted and fully featured responses.

I think you overestimated the LLM's abilities without the context/anecdotal evidence I have of reviewing comments over the years on GitHub. You'll need to somehow provide the LLM with that context in order to produce good results for this type of query.

I see that you have your tests for issue in it's own dir, should all handlers that need it be put into their own dir with their respective tests and mocks?

We should have the tests next to the code that is being tested. That's why as I understand it, Jest etc use globbing to find files with .test. in the name instead of just attempting to run all TypeScript from a specific directory.

Since AJV is being used for validation we have type safety against the schema, am I to test that it emits the correct errors for invalid types?

MAX_SAFE_INT is allowed for permits (a minimum should be defined here that is reasonable, $1 at least)

Using AJV to test the results is very valuable. We should try and define all constraints using AJV. It is a concise and unambiguous way to define expected "correct" results, both for manually changing the configuration or ChatGPT doing so.

I can make educated guesses (probs less effectively than GPT lmao) as to what everything should be but specifics like comment scoring and additional elements is daunting only because I don't know the ins-and-outs or complete vision for it.

I think a more effective query would be the following:

/config credit LI $1 each, all header tags (H1-H6) $1 each, and images $5. Everything else should be ignored.