ubiquibot / comment-incentives

0 stars 10 forks source link

Qualitative Analysis v2 #9

Open 0x4007 opened 5 months ago

0x4007 commented 5 months ago

Overview

Our current implementation for qualitative analysis is really janky. I implemented it myself. I asked ChatGPT to determine how "on topic" comments are and to reply only with an array of floats. It tends to send back unexpected amounts of elements in the array, when instead it is supposed to send exactly one element per one corresponding GitHub issue comment. I have a "brute force" approach where I tell it to keep retrying if the amount doesn't match, but it seems to break on large comments anyways.

This obviously is not the correct way to understand text similarity but it was a quick and dirty prototype that kind of works, so I shipped it.

The reason why this research is essential for the bot is because in the future the bot can truly understand all conversations and tasks happening within our network.

R&D Tasks

Appendix

Useful Capabilities Derived From Embeddings

Why are generating vector embeddings important for our strategic priorities?

Task Matchmaking

One killer feature is that we can do talent matchmaking with new tasks posted to our network. For example:

Reporting

Another useful capability is to be able to query in natural language to the bot (perhaps on Telegram) for any up-to-date knowledge related to any organization's live operations/products using our system. This could be very useful for investor updates or developer onboarding.

Assistive Time Estimates

We can track how long it took exactly for contributors to turn around the deliverable for similar tasks. We can measure starting from the assign time to their last commit time (instead of when its merged because that is mostly due to the reviewer team's lag.) We can also add more confidence/weight based on how similar the task description is, and how much XP the contributor has on the repository (if they have any XP it is assumed that they are already onboarded to the repository, as time estimates are designed to not include onboarding time.)

Assistive Priority Estimates

This may have lower usability compared to assistive time estimates, because organizations may have different strategic priorities, but we can crowdsource how high of priority similar tasks are.

Combining All Of The Above

If we can pull this off well, then we can create a magical experience of a team posting a new issue, and the bot fully handling the pricing based on crowd sourced time and priority estimates.

Then matchmaking the optimal talent to execute the task.

It can be a truly "hands off" management experience for getting things done at DAOs.

Cloudflare Embeddings

I've never worked with vector embeddings but I understand the general concept of it. We should use Cloudflare's free service to generate and store embeddings of every GitHub comment when the comment incentives are being calculated.

I think that we can make a public.vector-embeddings table, and the ID can be the GitHub comment ID. Then we can store an embedding in the next column over.

Finally I understand that we should be able to compute how "on topic" a comment is, based on the GItHub issue specification that the comment was posted on.

0x4007 commented 4 months ago

@FernandVEYRIER if you want to work on this repository, perhaps instead you can work on this task but package it as a standalone module. Then we can import it in the new version once we get that far with it?

gentlementlegen commented 4 months ago

@pavlovcik It feels like this repo should eventually be deprecated in favor of the newer version isn't it?

0x4007 commented 4 months ago

Yes but the research will carry over. Also the deliverables can be packaged neatly into portable modules that we can use in the next version!

gentlementlegen commented 4 months ago

Sure thing. This should be broken down into smaller tasks to be carried on then

0x4007 commented 4 months ago

It is. Start with #11 and then we will generate the rest based on the research results of #11