ubiquity-os-marketplace / generate-vector-embeddings

0 stars 6 forks source link

Fix Database Schema #8

Closed 0x4007 closed 1 week ago

0x4007 commented 2 weeks ago

This requires changes to the plugin that captures the data, as well as the database itself.

2024-09-08 21 54 35

This must capture issue bodies.


Perhaps I wasn't explicit with my vision for the database schema, but this needs adjustments.

We only need:

id, created_at, modified_at, author_id, plaintext, embedding
sshivaditya2019 commented 2 weeks ago

@0x4007 would plaintext include issue body as well or would it just be comment body?

So, right now if there's a private repository then the plaintext would become 'CENSORED'. Is that fine?

0x4007 commented 2 weeks ago

Issue body as well and no leave null

sshivaditya2019 commented 2 weeks ago

Issue body as well and no leave null

So, plaintext's value would issue body + comment body, or do we require a separate column, if that's the case then how should the embedding value be found? Is it just comment body or is it comment body + issue body.

0x4007 commented 2 weeks ago

Why do you want to associate and/or combine the embedding of the issue body and the comment body?

Can you provide some examples of applications that would require this?

sshivaditya2019 commented 2 weeks ago

Why do you want to associate and/or combine the embedding of the issue body and the comment body?

Can you provide some examples of applications that would require this?

Case 1: Issue Body + Comment Body:

Case 2: Issue Body and Comment Body Separate columns:

If having two separate embeddings is acceptable, then that's fine. According to the issue's specification, the desired schema was id, created_at, modified_at, author_id, plaintext, embedding.

0x4007 commented 2 weeks ago

I don't understand your explanation.

Anyways would be great if we can make the following applications:

  1. Issue deduplication.
  2. Assistant to answer questions specifically related to our DAO and products here on github and telegram in plain English

As I understand with my proposed schema, these applications are possible.

I wonder if it makes sense simply to capture every property, excluding the URL properties from the issue and comment objects on GitHub

sshivaditya2019 commented 2 weeks ago

I don't understand your explanation.

Anyways would be great if we can make the following applications:

  1. Issue deduplication.
  2. Assistant to answer questions specifically related to our DAO and products here on github and telegram in plain English

As I understand with my proposed schema, these applications are possible.

I wonder if it makes sense simply to capture every property, excluding the URL properties from the issue and comment objects on GitHub

Just to be clear, what is plaintext in your schema ?

0x4007 commented 2 weeks ago

The source code of the markdown of the comments.

I hope that the mark down syntax doesn't negatively affect the LLM's ability to understand.

I know that I always must clarify with ChatGPT that the right arrow bracket means it's a block quote from another comment.

I suppose we'll need to compare performance of raw marked down source code and something preprocessed/more similar to how we perceive it

sshivaditya2019 commented 2 weeks ago

The source code of the markdown of the comments.

I hope that the mark down syntax doesn't negatively affect the LLM's ability to understand.

I know that I always must clarify with ChatGPT that the right arrow bracket means it's a block quote from another comment.

I suppose we'll need to compare performance of raw marked down source code and something preprocessed/more similar to how we perceive it

I'll give an example of plaintext let me know if this right ?

## Issue #1: Sample Issue with Replies

This is the body of the first issue.

### Comment by user1:

This is the first comment on issue #1.

#### Reply by user2:

This is a reply to the first comment by user1.

#### Reply by user3:

This is another reply to the first comment by user1.

### Comment by user4:

This is a second comment on issue #1.

Is this right ? If not could you please give an example. I am not sure what you mean by source code of the markdown of the comments.

0x4007 commented 2 weeks ago

I am not an expert working with embeddings but as I understand we need to generate a single embedding per corpus of text (comment)

Otherwise yes you included the markdown syntax which is what I said.

sshivaditya2019 commented 2 weeks ago

I am not an expert working with embeddings but as I understand we need to generate a single embedding per corpus of text (comment)

Otherwise yes you included the markdown syntax which is what I said.

To clarify, the example I shared involves a single embedding/vector. Currently, this means that a new text corpus(markdown I shared before) will be created for each new issue, and additional comments will be added to that corpus.

Currently, we create a new record for each comment and generate an embedding for each one. This approach won't work with the issuededup or Assistant plan you have.

sshivaditya2019 commented 2 weeks ago

I am not an expert working with embeddings but as I understand we need to generate a single embedding per corpus of text (comment)

Otherwise yes you included the markdown syntax which is what I said.

So, to clarify, should the embedding be created per comment while the plaintext contains the entire markdown syntax I mentioned earlier? Is that the correct approach?

I don't think this will work as intended. We need either the comments with their embeddings or the text corpus (markdown) with its embeddings. Please let me know if I've misunderstood anything.

0x4007 commented 2 weeks ago

So then if we are clobbering a single embedding to represent an entire issue (and pull) won't the costs increase exponentially? Can't we feed multiple embeddings to an LLM to work with?

It doesn't seem like a good approach to clobber

sshivaditya2019 commented 2 weeks ago

So then if we are clobbering a single embedding to represent an entire issue (and pull) won't the costs increase exponentially? Can't we feed multiple embeddings to an LLM to work with?

It doesn't seem like a good approach to clobber

An alternative approach would be to store each piece of text from issues/comments as separate entities and associate them with their respective node IDs. This would create a large text corpus containing all the content from the organization, which could be identified by a global node ID or by using a type column in the schema.

sshivaditya2019 commented 2 weeks ago

So then if we are clobbering a single embedding to represent an entire issue (and pull) won't the costs increase exponentially? Can't we feed multiple embeddings to an LLM to work with?

It doesn't seem like a good approach to clobber

To clarify, we don’t provide embeddings directly to the LLM. Instead, we perform a vector or similarity search, apply a ranking technique, and then extract text values that are most relevant to the search subject. This extracted text is then used as context for the LLM.

0x4007 commented 2 weeks ago

@gentlementlegen im assuming storing node id won't be sufficient for us to determine which embeddings are associated with eachother. I'm assuming we need to also store the issue id

Anyways at this point im starting to think we should just store everything besides the _url properties and then we can optimize later since we seem to not know for sure what to do now.

0x4007 commented 2 weeks ago

What are embeddings? OpenAI’s text embeddings measure the relatedness of text strings. Embeddings are commonly used for:

Search (where results are ranked by relevance to a query string) Clustering (where text strings are grouped by similarity) Recommendations (where items with related text strings are recommended) Anomaly detection (where outliers with little relatedness are identified) Diversity measurement (where similarity distributions are analyzed) Classification (where text strings are classified by their most similar label)


ChatGPT: Here are additional use cases for embeddings:

  1. Sentiment Analysis: Embeddings can help map text to a sentiment space, allowing you to analyze the emotional tone of a text (positive, negative, neutral).
  2. Language Translation: Embeddings capture semantic meaning, making them useful for machine translation, where similar sentences in different languages are close in embedding space.
  3. Information Extraction: Identify and extract structured information from unstructured text by mapping words or phrases to relevant concepts.
  4. Text Summarization: Summarize large documents by comparing sentence embeddings and extracting the most important, representative sentences.
  5. Question Answering: Embeddings can improve the ability to match questions with the most relevant answers based on semantic similarity.
  6. Text Generation: Used in generative models like GPT, embeddings help maintain coherence in text generation by representing context.
  7. Content Moderation: Classify or filter harmful or inappropriate content by mapping texts to predefined categories of acceptable and non-acceptable behavior.
  8. Topic Modeling: Discover underlying topics within a large set of documents by clustering similar text embeddings.

Reviewing the leaderboard and I realize that STS (Semantic Textual Similarity) is the most important for depuplication and task matchmaking. Check the STS tab. voyage-lite-02-instruct looks promising (the first ranked seems specifically designed for English-French translations?)

Seems significantly better than OpenAI's offering.

We may also be able to make a feature that allows contributors to ask product/strategic direction questions to the bot. It can link back to comments that answer their question.

Text summarization might be able to improve our relevance scoring of comments (verbose and concise comments may be able to receive similar "quantitative scoring" credit)

sshivaditya2019 commented 2 weeks ago

@gentlementlegen im assuming storing node id won't be sufficient for us to determine which embeddings are associated with eachother. I'm assuming we need to also store the issue id

Anyways at this point im starting to think we should just store everything besides the _url properties and then we can optimize later since we seem to not know for sure what to do now.

@0x4007

Hi, could you update me on the final database schema? I believe that the global node ID and type should be sufficient. Other parameters might not be relevant to the embedding text corpus and could end up being more of a log dump.

sshivaditya2019 commented 2 weeks ago

What are embeddings? OpenAI’s text embeddings measure the relatedness of text strings. Embeddings are commonly used for: Search (where results are ranked by relevance to a query string) Clustering (where text strings are grouped by similarity) Recommendations (where items with related text strings are recommended) Anomaly detection (where outliers with little relatedness are identified) Diversity measurement (where similarity distributions are analyzed) Classification (where text strings are classified by their most similar label)

ChatGPT: Here are additional use cases for embeddings:

  1. Sentiment Analysis: Embeddings can help map text to a sentiment space, allowing you to analyze the emotional tone of a text (positive, negative, neutral).
  2. Language Translation: Embeddings capture semantic meaning, making them useful for machine translation, where similar sentences in different languages are close in embedding space.
  3. Information Extraction: Identify and extract structured information from unstructured text by mapping words or phrases to relevant concepts.
  4. Text Summarization: Summarize large documents by comparing sentence embeddings and extracting the most important, representative sentences.
  5. Question Answering: Embeddings can improve the ability to match questions with the most relevant answers based on semantic similarity.
  6. Text Generation: Used in generative models like GPT, embeddings help maintain coherence in text generation by representing context.
  7. Content Moderation: Classify or filter harmful or inappropriate content by mapping texts to predefined categories of acceptable and non-acceptable behavior.
  8. Topic Modeling: Discover underlying topics within a large set of documents by clustering similar text embeddings.

Reviewing the leaderboard and I realize that STS (Semantic Textual Similarity) is the most important for depuplication and task matchmaking. Check the STS tab. voyage-lite-02-instruct looks promising (the first ranked seems specifically designed for English-French translations?)

Seems significantly better than OpenAI's offering.

We may also be able to make a feature that allows contributors to ask product/strategic direction questions to the bot. It can link back to comments that answer their question.

Text summarization might be able to improve our relevance scoring of comments (verbose and concise comments may be able to receive similar "quantitative scoring" credit)

I can probably write a new adapter for them, but I think their free tier has only 3 RPM (Request per minute), but offer 50M free tokens for new accounts to use ?

0x4007 commented 2 weeks ago

Three requests per minute seems acceptable as long as we can have some type of buffer or queue. I believe we had plans to do this anyways with the OpenAI embeddings because it's significantly cheaper.

However, can we generate multiple embeddings per request? If not then I imagine this can get more complicated than a cron job.

As for schema, I think it's simpler to dump everything and optimize later. This is because we don't know what the next plugins will need and this gives us more flexibility for research. To me it's only a priority to optimize when we start hitting free tier limits.

sshivaditya2019 commented 2 weeks ago

Three requests per minute seems acceptable as long as we can have some type of buffer or queue. I believe we had plans to do this anyways with the OpenAI embeddings because it's significantly cheaper.

However, can we generate multiple embeddings per request? If not then I imagine this can get more complicated than a cron job.

As for schema, I think it's simpler to dump everything and optimize later. This is because we don't know what the next plugins will need and this gives us more flexibility for research. To me it's only a priority to optimize when we start hitting free tier limits.

So, if we add a payment method, the requests per minute can be increased. I have updated the plugin for support with voyageai

0x4007 commented 1 week ago

image

Any name and zip code should work

sshivaditya2019 commented 1 week ago

@0x4007

I've added my payment method. Please delete this information immediately just to be safe or share it through another platform, like Telegram.

0x4007 commented 1 week ago

It's merchant locked for up to $1

sshivaditya2019 commented 1 week ago

So, the final schema would be Id (GlobalNodeId), plaintext(text from the comment),createdAt, ModifiedAt, embedding and this entire thing as a serialized object

type Comment = {
        /**
         * AuthorAssociation
         * @description How the author is associated with the repository.
         * @enum {string}
         */
        author_association:
          | "COLLABORATOR"
          | "CONTRIBUTOR"
          | "FIRST_TIMER"
          | "FIRST_TIME_CONTRIBUTOR"
          | "MANNEQUIN"
          | "MEMBER"
          | "NONE"
          | "OWNER";
        /** @description Contents of the issue comment */
        body: string;
        /** Format: date-time */
        created_at: string;
        /** Format: uri */
        html_url: string;
        /**
         * Format: int64
         * @description Unique identifier of the issue comment
         */
        id: number;
        /** Format: uri */
        issue_url: string;
        node_id: string;
        performed_via_github_app: null | components["schemas"]["integration"];
        /** Reactions */
        reactions: {
          "+1": number;
          "-1": number;
          confused: number;
          eyes: number;
          heart: number;
          hooray: number;
          laugh: number;
          rocket: number;
          total_count: number;
          /** Format: uri */
          url: string;
        };
        /** Format: date-time */
        updated_at: string;
        /**
         * Format: uri
         * @description URL for the issue comment
         */
        url: string;
        /** User */
        user: {
          /** Format: uri */
          avatar_url?: string;
          deleted?: boolean;
          email?: string | null;
          /** Format: uri-template */
          events_url?: string;
          /** Format: uri */
          followers_url?: string;
          /** Format: uri-template */
          following_url?: string;
          /** Format: uri-template */
          gists_url?: string;
          gravatar_id?: string;
          /** Format: uri */
          html_url?: string;
          id: number;
          login: string;
          name?: string;
          node_id?: string;
          /** Format: uri */
          organizations_url?: string;
          /** Format: uri */
          received_events_url?: string;
          /** Format: uri */
          repos_url?: string;
          site_admin?: boolean;
          /** Format: uri-template */
          starred_url?: string;
          /** Format: uri */
          subscriptions_url?: string;
          /** @enum {string} */
          type?: "Bot" | "User" | "Organization";
          /** Format: uri */
          url?: string;
        } | null;
      };

Is this schema fine, or is there anything else to be added or removed ?

0x4007 commented 1 week ago

Looks like there isn't a lot to save from there if we remove the URLs, nested objects like reactions and user details.

I'm surprised that's all that's there.

Performed via GitHub app should also be removed because we shouldn't be generating embeddings for bot comments.

So after all of that, looks like basically we might only be adding author association.

sshivaditya2019 commented 1 week ago

Looks like there isn't a lot to save from there if we remove the URLs, nested objects like reactions and user details.

I'm surprised that's all that's there.

Performed via GitHub app should also be removed because we shouldn't be generating embeddings for bot comments.

I took this from octokit's type definition. I think reactions should be kept, as it could be a metric of engagement for comments with abnormally low relevance.

Also, should the created_at in the schema be referenced from the comment or when the actual record was created.

This is the entire dump of the webhook event

/** issue_comment created event */
    "webhook-issue-comment-created": {
      /** @enum {string} */
      action: "created";
      /**
       * issue comment
       * @description The [comment](https://docs.github.com/rest/issues/comments#get-an-issue-comment) itself.
       */
      comment: {
        /**
         * AuthorAssociation
         * @description How the author is associated with the repository.
         * @enum {string}
         */
        author_association:
          | "COLLABORATOR"
          | "CONTRIBUTOR"
          | "FIRST_TIMER"
          | "FIRST_TIME_CONTRIBUTOR"
          | "MANNEQUIN"
          | "MEMBER"
          | "NONE"
          | "OWNER";
        /** @description Contents of the issue comment */
        body: string;
        /** Format: date-time */
        created_at: string;
        /** Format: uri */
        html_url: string;
        /**
         * Format: int64
         * @description Unique identifier of the issue comment
         */
        id: number;
        /** Format: uri */
        issue_url: string;
        node_id: string;
        performed_via_github_app: null | components["schemas"]["integration"];
        /** Reactions */
        reactions: {
          "+1": number;
          "-1": number;
          confused: number;
          eyes: number;
          heart: number;
          hooray: number;
          laugh: number;
          rocket: number;
          total_count: number;
          /** Format: uri */
          url: string;
        };
        /** Format: date-time */
        updated_at: string;
        /**
         * Format: uri
         * @description URL for the issue comment
         */
        url: string;
        /** User */
        user: {
          /** Format: uri */
          avatar_url?: string;
          deleted?: boolean;
          email?: string | null;
          /** Format: uri-template */
          events_url?: string;
          /** Format: uri */
          followers_url?: string;
          /** Format: uri-template */
          following_url?: string;
          /** Format: uri-template */
          gists_url?: string;
          gravatar_id?: string;
          /** Format: uri */
          html_url?: string;
          id: number;
          login: string;
          name?: string;
          node_id?: string;
          /** Format: uri */
          organizations_url?: string;
          /** Format: uri */
          received_events_url?: string;
          /** Format: uri */
          repos_url?: string;
          site_admin?: boolean;
          /** Format: uri-template */
          starred_url?: string;
          /** Format: uri */
          subscriptions_url?: string;
          /** @enum {string} */
          type?: "Bot" | "User" | "Organization";
          /** Format: uri */
          url?: string;
        } | null;
      };
      installation?: components["schemas"]["simple-installation"];
      /** @description The [issue](https://docs.github.com/rest/issues/issues#get-an-issue) the comment belongs to. */
      issue: {
        /** @enum {string|null} */
        active_lock_reason:
          | "resolved"
          | "off-topic"
          | "too heated"
          | "spam"
          | null;
        /** User */
        assignee?: {
          /** Format: uri */
          avatar_url?: string;
          deleted?: boolean;
          email?: string | null;
          /** Format: uri-template */
          events_url?: string;
          /** Format: uri */
          followers_url?: string;
          /** Format: uri-template */
          following_url?: string;
          /** Format: uri-template */
          gists_url?: string;
          gravatar_id?: string;
          /** Format: uri */
          html_url?: string;
          id: number;
          login: string;
          name?: string;
          node_id?: string;
          /** Format: uri */
          organizations_url?: string;
          /** Format: uri */
          received_events_url?: string;
          /** Format: uri */
          repos_url?: string;
          site_admin?: boolean;
          /** Format: uri-template */
          starred_url?: string;
          /** Format: uri */
          subscriptions_url?: string;
          /** @enum {string} */
          type?: "Bot" | "User" | "Organization" | "Mannequin";
          /** Format: uri */
          url?: string;
        } | null;
        assignees: ({
          /** Format: uri */
          avatar_url?: string;
          deleted?: boolean;
          email?: string | null;
          /** Format: uri-template */
          events_url?: string;
          /** Format: uri */
          followers_url?: string;
          /** Format: uri-template */
          following_url?: string;
          /** Format: uri-template */
          gists_url?: string;
          gravatar_id?: string;
          /** Format: uri */
          html_url?: string;
          id: number;
          login: string;
          name?: string;
          node_id?: string;
          /** Format: uri */
          organizations_url?: string;
          /** Format: uri */
          received_events_url?: string;
          /** Format: uri */
          repos_url?: string;
          site_admin?: boolean;
          /** Format: uri-template */
          starred_url?: string;
          /** Format: uri */
          subscriptions_url?: string;
          /** @enum {string} */
          type?: "Bot" | "User" | "Organization" | "Mannequin";
          /** Format: uri */
          url?: string;
        } | null)[];
        /**
         * AuthorAssociation
         * @description How the author is associated with the repository.
         * @enum {string}
         */
        author_association:
          | "COLLABORATOR"
          | "CONTRIBUTOR"
          | "FIRST_TIMER"
          | "FIRST_TIME_CONTRIBUTOR"
          | "MANNEQUIN"
          | "MEMBER"
          | "NONE"
          | "OWNER";
        /** @description Contents of the issue */
        body: string | null;
        /** Format: date-time */
        closed_at: string | null;
        comments: number;
        /** Format: uri */
        comments_url: string;
        /** Format: date-time */
        created_at: string;
        draft?: boolean;
        /** Format: uri */
        events_url: string;
        /** Format: uri */
        html_url: string;
        /** Format: int64 */
        id: number;
        labels?: {
          /** @description 6-character hex code, without the leading #, identifying the color */
          color: string;
          default: boolean;
          description: string | null;
          id: number;
          /** @description The name of the label. */
          name: string;
          node_id: string;
          /**
           * Format: uri
           * @description URL for the label
           */
          url: string;
        }[];
        /** Format: uri-template */
        labels_url: string;
        locked?: boolean;
        /**
         * Milestone
         * @description A collection of related issues and pull requests.
         */
        milestone: {
          /** Format: date-time */
          closed_at: string | null;
          closed_issues: number;
          /** Format: date-time */
          created_at: string;
          /** User */
          creator: {
            /** Format: uri */
            avatar_url?: string;
            deleted?: boolean;
            email?: string | null;
            /** Format: uri-template */
            events_url?: string;
            /** Format: uri */
            followers_url?: string;
            /** Format: uri-template */
            following_url?: string;
            /** Format: uri-template */
            gists_url?: string;
            gravatar_id?: string;
            /** Format: uri */
            html_url?: string;
            id: number;
            login: string;
            name?: string;
            node_id?: string;
            /** Format: uri */
            organizations_url?: string;
            /** Format: uri */
            received_events_url?: string;
            /** Format: uri */
            repos_url?: string;
            site_admin?: boolean;
            /** Format: uri-template */
            starred_url?: string;
            /** Format: uri */
            subscriptions_url?: string;
            /** @enum {string} */
            type?: "Bot" | "User" | "Organization" | "Mannequin";
            /** Format: uri */
            url?: string;
          } | null;
          description: string | null;
          /** Format: date-time */
          due_on: string | null;
          /** Format: uri */
          html_url: string;
          id: number;
          /** Format: uri */
          labels_url: string;
          node_id: string;
          /** @description The number of the milestone. */
          number: number;
          open_issues: number;
          /**
           * @description The state of the milestone.
           * @enum {string}
           */
          state: "open" | "closed";
          /** @description The title of the milestone. */
          title: string;
          /** Format: date-time */
          updated_at: string;
          /** Format: uri */
          url: string;
        } | null;
        node_id: string;
        number: number;
        /**
         * App
         * @description GitHub apps are a new way to extend GitHub. They can be installed directly on organizations and user accounts and granted access to specific repositories. They come with granular permissions and built-in webhooks. GitHub apps are first class actors within GitHub.
         */
        performed_via_github_app?: {
          /** Format: date-time */
          created_at: string | null;
          description: string | null;
          /** @description The list of events for the GitHub app */
          events?: (
            | "branch_protection_rule"
            | "check_run"
            | "check_suite"
            | "code_scanning_alert"
            | "commit_comment"
            | "content_reference"
            | "create"
            | "delete"
            | "deployment"
            | "deployment_review"
            | "deployment_status"
            | "deploy_key"
            | "discussion"
            | "discussion_comment"
            | "fork"
            | "gollum"
            | "issues"
            | "issue_comment"
            | "label"
            | "member"
            | "membership"
            | "milestone"
            | "organization"
            | "org_block"
            | "page_build"
            | "project"
            | "project_card"
            | "project_column"
            | "public"
            | "pull_request"
            | "pull_request_review"
            | "pull_request_review_comment"
            | "push"
            | "registry_package"
            | "release"
            | "repository"
            | "repository_dispatch"
            | "secret_scanning_alert"
            | "star"
            | "status"
            | "team"
            | "team_add"
            | "watch"
            | "workflow_dispatch"
            | "workflow_run"
            | "reminder"
            | "pull_request_review_thread"
          )[];
          /** Format: uri */
          external_url: string | null;
          /** Format: uri */
          html_url: string;
          /** @description Unique identifier of the GitHub app */
          id: number | null;
          /** @description The name of the GitHub app */
          name: string;
          node_id: string;
          /** User */
          owner: {
            /** Format: uri */
            avatar_url?: string;
            deleted?: boolean;
            email?: string | null;
            /** Format: uri-template */
            events_url?: string;
            /** Format: uri */
            followers_url?: string;
            /** Format: uri-template */
            following_url?: string;
            /** Format: uri-template */
            gists_url?: string;
            gravatar_id?: string;
            /** Format: uri */
            html_url?: string;
            id: number;
            login: string;
            name?: string;
            node_id?: string;
            /** Format: uri */
            organizations_url?: string;
            /** Format: uri */
            received_events_url?: string;
            /** Format: uri */
            repos_url?: string;
            site_admin?: boolean;
            /** Format: uri-template */
            starred_url?: string;
            /** Format: uri */
            subscriptions_url?: string;
            /** @enum {string} */
            type?: "Bot" | "User" | "Organization";
            /** Format: uri */
            url?: string;
          } | null;
          /** @description The set of permissions for the GitHub app */
          permissions?: {
            /** @enum {string} */
            actions?: "read" | "write";
            /** @enum {string} */
            administration?: "read" | "write";
            /** @enum {string} */
            checks?: "read" | "write";
            /** @enum {string} */
            content_references?: "read" | "write";
            /** @enum {string} */
            contents?: "read" | "write";
            /** @enum {string} */
            deployments?: "read" | "write";
            /** @enum {string} */
            discussions?: "read" | "write";
            /** @enum {string} */
            emails?: "read" | "write";
            /** @enum {string} */
            environments?: "read" | "write";
            /** @enum {string} */
            issues?: "read" | "write";
            /** @enum {string} */
            keys?: "read" | "write";
            /** @enum {string} */
            members?: "read" | "write";
            /** @enum {string} */
            metadata?: "read" | "write";
            /** @enum {string} */
            organization_administration?: "read" | "write";
            /** @enum {string} */
            organization_hooks?: "read" | "write";
            /** @enum {string} */
            organization_packages?: "read" | "write";
            /** @enum {string} */
            organization_plan?: "read" | "write";
            /** @enum {string} */
            organization_projects?: "read" | "write" | "admin";
            /** @enum {string} */
            organization_secrets?: "read" | "write";
            /** @enum {string} */
            organization_self_hosted_runners?: "read" | "write";
            /** @enum {string} */
            organization_user_blocking?: "read" | "write";
            /** @enum {string} */
            packages?: "read" | "write";
            /** @enum {string} */
            pages?: "read" | "write";
            /** @enum {string} */
            pull_requests?: "read" | "write";
            /** @enum {string} */
            repository_hooks?: "read" | "write";
            /** @enum {string} */
            repository_projects?: "read" | "write" | "admin";
            /** @enum {string} */
            secret_scanning_alerts?: "read" | "write";
            /** @enum {string} */
            secrets?: "read" | "write";
            /** @enum {string} */
            security_events?: "read" | "write";
            /** @enum {string} */
            security_scanning_alert?: "read" | "write";
            /** @enum {string} */
            single_file?: "read" | "write";
            /** @enum {string} */
            statuses?: "read" | "write";
            /** @enum {string} */
            team_discussions?: "read" | "write";
            /** @enum {string} */
            vulnerability_alerts?: "read" | "write";
            /** @enum {string} */
            workflows?: "read" | "write";
          };
          /** @description The slug name of the GitHub app */
          slug?: string;
          /** Format: date-time */
          updated_at: string | null;
        } | null;
        pull_request?: {
          /** Format: uri */
          diff_url?: string;
          /** Format: uri */
          html_url?: string;
          /** Format: date-time */
          merged_at?: string | null;
          /** Format: uri */
          patch_url?: string;
          /** Format: uri */
          url?: string;
        };
        /** Reactions */
        reactions: {
          "+1": number;
          "-1": number;
          confused: number;
          eyes: number;
          heart: number;
          hooray: number;
          laugh: number;
          rocket: number;
          total_count: number;
          /** Format: uri */
          url: string;
        };
        /** Format: uri */
        repository_url: string;
        /**
         * @description State of the issue; either 'open' or 'closed'
         * @enum {string}
         */
        state?: "open" | "closed";
        state_reason?: string | null;
        /** Format: uri */
        timeline_url?: string;
        /** @description Title of the issue */
        title: string;
        /** Format: date-time */
        updated_at: string;
        /**
         * Format: uri
         * @description URL for the issue
         */
        url: string;
        /** User */
        user: {
          /** Format: uri */
          avatar_url?: string;
          deleted?: boolean;
          email?: string | null;
          /** Format: uri-template */
          events_url?: string;
          /** Format: uri */
          followers_url?: string;
          /** Format: uri-template */
          following_url?: string;
          /** Format: uri-template */
          gists_url?: string;
          gravatar_id?: string;
          /** Format: uri */
          html_url?: string;
          id: number;
          login: string;
          name?: string;
          node_id?: string;
          /** Format: uri */
          organizations_url?: string;
          /** Format: uri */
          received_events_url?: string;
          /** Format: uri */
          repos_url?: string;
          site_admin?: boolean;
          /** Format: uri-template */
          starred_url?: string;
          /** Format: uri */
          subscriptions_url?: string;
          /** @enum {string} */
          type?: "Bot" | "User" | "Organization" | "Mannequin";
          /** Format: uri */
          url?: string;
        } | null;
      } & {
        active_lock_reason?: string | null;
        /** User */
        assignee: {
          /** Format: uri */
          avatar_url?: string;
          deleted?: boolean;
          email?: string | null;
          /** Format: uri-template */
          events_url?: string;
          /** Format: uri */
          followers_url?: string;
          /** Format: uri-template */
          following_url?: string;
          /** Format: uri-template */
          gists_url?: string;
          gravatar_id?: string;
          /** Format: uri */
          html_url?: string;
          id: number;
          login: string;
          name?: string;
          node_id?: string;
          /** Format: uri */
          organizations_url?: string;
          /** Format: uri */
          received_events_url?: string;
          /** Format: uri */
          repos_url?: string;
          site_admin?: boolean;
          /** Format: uri-template */
          starred_url?: string;
          /** Format: uri */
          subscriptions_url?: string;
          /** @enum {string} */
          type?: "Bot" | "User" | "Organization" | "Mannequin";
          /** Format: uri */
          url?: string;
        } | null;
        assignees?: (Record<string, never> | null)[];
        author_association?: string;
        body?: string | null;
        closed_at?: string | null;
        comments?: number;
        comments_url?: string;
        created_at?: string;
        events_url?: string;
        html_url?: string;
        id?: number;
        labels: {
          /** @description 6-character hex code, without the leading #, identifying the color */
          color: string;
          default: boolean;
          description: string | null;
          id: number;
          /** @description The name of the label. */
          name: string;
          node_id: string;
          /**
           * Format: uri
           * @description URL for the label
           */
          url: string;
        }[];
        labels_url?: string;
        locked: boolean;
        milestone?: Record<string, never> | null;
        node_id?: string;
        number?: number;
        performed_via_github_app?: Record<string, never> | null;
        reactions?: {
          "+1"?: number;
          "-1"?: number;
          confused?: number;
          eyes?: number;
          heart?: number;
          hooray?: number;
          laugh?: number;
          rocket?: number;
          total_count?: number;
          url?: string;
        };
        repository_url?: string;
        /**
         * @description State of the issue; either 'open' or 'closed'
         * @enum {string}
         */
        state: "open" | "closed";
        timeline_url?: string;
        title?: string;
        updated_at?: string;
        url?: string;
        user?: {
          avatar_url?: string;
          events_url?: string;
          followers_url?: string;
          following_url?: string;
          gists_url?: string;
          gravatar_id?: string;
          html_url?: string;
          id?: number;
          login?: string;
          node_id?: string;
          organizations_url?: string;
          received_events_url?: string;
          repos_url?: string;
          site_admin?: boolean;
          starred_url?: string;
          subscriptions_url?: string;
          type?: string;
          url?: string;
        };
      };
      organization?: components["schemas"]["organization-simple-webhooks"];
      repository: components["schemas"]["repository-webhooks"];
      sender: components["schemas"]["simple-user-webhooks"];
    };
0x4007 commented 1 week ago

My concern is that I feel we should make everything into its own column. However, for simplicity, we could just save the full reaction object in a single column.

sshivaditya2019 commented 1 week ago

My concern is that I feel we should make everything into its own column. However, for simplicity, we could just save the full reaction object in a single column.

If we are not going to be using relations and other sql things, we can try with mongoDB or any other document based db ?

0x4007 commented 1 week ago

I think Supabase is probably fine because there are no planned features for reactions. We unlikely need the performance for querying for a long time. Even if or when we do, we can consider adding those columns later.

To be honest I'm not a database expert but seems like Supabase should handle our needs. It has a nice UI which is convenient for debugging

sshivaditya2019 commented 1 week ago

I think Supabase is probably fine because there are no planned features for reactions. We unlikely need the performance for querying for a long time. Even if or when we do, we can consider adding those columns later.

To be honest I'm not a database expert but seems like Supabase should handle our needs. It has a nice UI which is convenient for debugging

It should be better. I am skeptical about storing serialized comment objects in Postgres database. It should not affect the performance though.

0x4007 commented 1 week ago

I asked ChatGPT for a new feature: onboarding new developers, via plaintext q&a. It said we should generate embeddings of an entire repository. How does this affect our schema? Perhaps we should generalize further? Or we can make a new table?

Perhaps a new table per GitHub object type is the most manageable.

sshivaditya2019 commented 1 week ago

I asked ChatGPT for a new feature: onboarding new developers, via plaintext q&a. It said we should generate embeddings of an entire repository. How does this affect our schema? Perhaps we should generalize further? Or we can make a new table?

Perhaps a new table per GitHub object type is the most manageable.

that would depend on the onboarding, if onboarding involves explaining the current tickets and comments and the work that's going on, this schema would be enough.

If we want to explain the code and other things, for them we do not need embeddings. For General Q&A this should be enough. We could further expand this by adding OpenAI functions, to query the repo based on the serialized object retrieved using vector search.

sshivaditya2019 commented 1 week ago

/start

ubiquity-os[bot] commented 1 week ago
DeadlineThu, Sep 12, 5:33 PM UTC
Beneficiary 0xDAba6e01D15Db560b88C8F426b016801f79e1F69
Tips:
<ul>
<li>Use <code>/wallet 0x0000...0000</code> if you want to update your registered payment wallet address.</li>
<li>Be sure to open a draft pull request as soon as possible to communicate updates on your progress.</li>
<li>Be sure to provide timely updates to us when requested, or you will be automatically unassigned from the task.</li>
<ul>
ubiquity-os[bot] commented 1 week ago

[ 200 WXDAI ]

@sshivaditya2019
Contributions Overview
View Contribution Count Reward
Issue Task 1 200
Issue Comment 18 0
Review Comment 20 0
Conversation Incentives
Comment Formatting Relevance Reward
@0x4007 would plaintext include issue body as well or would it j…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 33
        multiplier: 0
    score: 1
multiplier: 0
0.7 -
So, plaintext's value would issue body + comment body, or do we …
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 41
        multiplier: 0
    score: 1
multiplier: 0
0.8 -
Case 1: Issue Body + Comment Body: - Only one vector index for …
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 6
        multiplier: 0
    score: 1
  ul:
    symbols:
      \b\w+\b:
        count: 106
        multiplier: 0
    score: 1
  li:
    symbols:
      \b\w+\b:
        count: 35
        multiplier: 0
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 6
        multiplier: 0
    score: 1
multiplier: 0
0.8 -
Just to be clear, what is plaintext in your schema ?
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 10
        multiplier: 0
    score: 1
multiplier: 0
0.5 -
I'll give an example of plaintext let me know if this right ? &…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 27
        multiplier: 0
    score: 1
  pre:
    symbols:
      \b\w+\b:
        count: 62
        multiplier: 0
    score: 0
  code:
    symbols:
      \b\w+\b:
        count: 8
        multiplier: 0
    score: 1
multiplier: 0
0.7 -
To clarify, the example I shared involves a single embedding/vec…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 68
        multiplier: 0
    score: 1
multiplier: 0
0.6 -
So, to clarify, should the embedding be created per comment whil…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 60
        multiplier: 0
    score: 1
multiplier: 0
0.8 -
An alternative approach would be to store each piece of text fro…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 57
        multiplier: 0
    score: 1
multiplier: 0
0.9 -
To clarify, we don’t provide embeddings directly to the LLM. Ins…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 47
        multiplier: 0
    score: 1
multiplier: 0
0.8 -
@0x4007 Hi, could you update me on the final database schema? …
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 44
        multiplier: 0
    score: 1
multiplier: 0
0.7 -
I can probably write a new adapter for them, but I think their f…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 32
        multiplier: 0
    score: 1
multiplier: 0
0.3 -
So, if we add a payment method, the requests per minute can be i…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 23
        multiplier: 0
    score: 1
multiplier: 0
0.4 -
@0x4007 I've added my payment method. Please delete this infor…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 24
        multiplier: 0
    score: 1
multiplier: 0
0.2 -
So, the final schema would be `Id (GlobalNodeId), plaintext(…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 14
        multiplier: 0
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 179
        multiplier: 0
    score: 1
  pre:
    symbols:
      \b\w+\b:
        count: 179
        multiplier: 0
    score: 0
multiplier: 0
0.9 -
I took this from octokit's type definition. I think reactions sh…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 56
        multiplier: 0
    score: 1
  pre:
    symbols:
      \b\w+\b:
        count: 1530
        multiplier: 0
    score: 0
  code:
    symbols:
      \b\w+\b:
        count: 1530
        multiplier: 0
    score: 1
multiplier: 0
0.8 -
If we are not going to be using relations and other sql things, …
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 24
        multiplier: 0
    score: 1
multiplier: 0
0.6 -
It should be better. I am skeptical about storing serialized com…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 22
        multiplier: 0
    score: 1
multiplier: 0
0.5 -
that would depend on the onboarding, if onboarding involves expl…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 74
        multiplier: 0
    score: 1
multiplier: 0
0.7 -
Resolves [#8](https://github.com/ubiquibot/issue-comment-embeddi…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0
    score: 1
  a:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0
    score: 1
  ul:
    symbols:
      \b\w+\b:
        count: 40
        multiplier: 0
    score: 1
  li:
    symbols:
      \b\w+\b:
        count: 4
        multiplier: 0
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0
    score: 1
multiplier: 0
1 -
This is according to the schema mentioned in the issue spec, cou…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 23
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
I’ve retained it in case it can be expanded to support OpenAI's …
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 30
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
This is the maximum possible length. Even if we switch providers…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 33
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
I wanted to keep it consistent with the DB schema. This passed t…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 24
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
Its there to ignore, bot comments, and chore issues created. So,…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 38
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
I don't think its safe to give codeblock to jsdom. This works fi…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 31
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
Replace it with `Markdown-it`. This library is being mai…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 15
        multiplier: 0.2
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
If a text is empty, instead of calling embedding api on it I am …
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 56
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
This can be used for issue deduplication and stuff. I think this…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 16
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
Fixed
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
@0x4007 Updated the schema, comments use `voyageai` for …
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 10
        multiplier: 0.2
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
@0x4007 Could you please check the updated changes ? Have remove…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 23
        multiplier: 0.2
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
I have removed the changes for the other task. CI Should be pass…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 25
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
Right now issue body is in the payload object, this is jsonb, wh…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 18
        multiplier: 0.2
    score: 1
  img:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.2
    score: 0
multiplier: 0
1 -
So to sum up, you require two tables one for `comments` …
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 44
        multiplier: 0.2
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 7
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
@0x4007 Have Added Two Separate Tables as per the schema mention…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 26
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
Yes, it includes a payload. I've retained the type column to dis…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 38
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
Removed the `type` from schema. Payload is stored for bo…
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 24
        multiplier: 0.2
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.2
    score: 1
multiplier: 0
1 -
@0x4007 I have added `markdown` and `plaintext` …
0
content:
  p:
    symbols:
      \b\w+\b:
        count: 39
        multiplier: 0.2
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.2
    score: 1
multiplier: 0
1 -

[ 114.801 WXDAI ]

@0x4007
Contributions Overview
View Contribution Count Reward
Issue Specification 1 16.82
Issue Comment 15 59.741
Review Comment 27 38.24
Conversation Incentives
Comment Formatting Relevance Reward
This requires changes to the plugin that captures the data, as w…
16.82
content:
  h2:
    symbols:
      \b\w+\b:
        count: 21
        multiplier: 0.1
    score: 1
  img:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.1
    score: 5
  p:
    symbols:
      \b\w+\b:
        count: 19
        multiplier: 0.1
    score: 1
  pre:
    symbols:
      \b\w+\b:
        count: 6
        multiplier: 0.1
    score: 0
  code:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.1
    score: 5
  ul:
    symbols:
      \b\w+\b:
        count: 192
        multiplier: 0.1
    score: 0
  li:
    symbols:
      \b\w+\b:
        count: 35
        multiplier: 0.1
    score: 1
multiplier: 3
1 16.82
Issue body as well and no leave null
1.17
content:
  p:
    symbols:
      \b\w+\b:
        count: 8
        multiplier: 0.2
    score: 1
multiplier: 1
0.5 0.585
Why do you want to associate and/or combine the embedding of the…
3.6
content:
  p:
    symbols:
      \b\w+\b:
        count: 30
        multiplier: 0.2
    score: 1
multiplier: 1
0.7 2.52
I don't understand your explanation. Anyways would be great if …
8.07
content:
  p:
    symbols:
      \b\w+\b:
        count: 17
        multiplier: 0.2
    score: 1
  ol:
    symbols:
      \b\w+\b:
        count: 55
        multiplier: 0.2
    score: 0
  li:
    symbols:
      \b\w+\b:
        count: 53
        multiplier: 0.2
    score: 1
multiplier: 1
0.8 6.456
The source code of the markdown of the comments. I hope that th…
7.67
content:
  p:
    symbols:
      \b\w+\b:
        count: 73
        multiplier: 0.2
    score: 1
multiplier: 1
0.4 3.068
I am not an expert working with embeddings but as I understand w…
4.21
content:
  p:
    symbols:
      \b\w+\b:
        count: 36
        multiplier: 0.2
    score: 1
multiplier: 1
0.6 2.526
So then if we are clobbering a single embedding to represent an …
4.99
content:
  p:
    symbols:
      \b\w+\b:
        count: 44
        multiplier: 0.2
    score: 1
multiplier: 1
0.7 3.493
@gentlementlegen im assuming storing node id won't be sufficient…
7.04
content:
  p:
    symbols:
      \b\w+\b:
        count: 66
        multiplier: 0.2
    score: 1
multiplier: 1
0.8 5.632
OpenAI’s text embeddings measure the relatedness of text strings…
12.3
content:
  h2:
    symbols:
      \b\w+\b:
        count: 8
        multiplier: 0.2
    score: 1
  p:
    symbols:
      \b\w+\b:
        count: 103
        multiplier: 0.2
    score: 1
  a:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.2
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 4
        multiplier: 0.2
    score: 1
multiplier: 1
0.9 11.07
Three requests per minute seems acceptable as long as we can hav…
10.79
content:
  p:
    symbols:
      \b\w+\b:
        count: 109
        multiplier: 0.2
    score: 1
multiplier: 1
0.8 8.632
![image](https://github.com/user-attachments/assets/eaf5c732-5c8…
1.05
content:
  p:
    symbols:
      \b\w+\b:
        count: 7
        multiplier: 0.2
    score: 1
  img:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.2
    score: 0
multiplier: 1
0.1 0.105
It's merchant locked for up to $1
1.17
content:
  p:
    symbols:
      \b\w+\b:
        count: 8
        multiplier: 0.2
    score: 1
multiplier: 1
0.1 0.117
Looks like there isn't a lot to save from there if we remove the…
6.95
content:
  p:
    symbols:
      \b\w+\b:
        count: 65
        multiplier: 0.2
    score: 1
  br:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.2
    score: 0
multiplier: 1
0.7 4.865
My concern is that I feel we should make everything into its own…
3.5
content:
  p:
    symbols:
      \b\w+\b:
        count: 29
        multiplier: 0.2
    score: 1
multiplier: 1
0.5 1.75
I think Supabase is probably fine because there are no planned f…
6.95
content:
  p:
    symbols:
      \b\w+\b:
        count: 65
        multiplier: 0.2
    score: 1
multiplier: 1
0.6 4.17
I asked ChatGPT for a new feature: onboarding new developers, vi…
5.94
content:
  p:
    symbols:
      \b\w+\b:
        count: 54
        multiplier: 0.2
    score: 1
multiplier: 1
0.8 4.752
Seems generally okay. Let me see how your database looks
0.71
content:
  p:
    symbols:
      \b\w+\b:
        count: 10
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.71
```suggestion const authorId = payload.comment.use…
0.52
content:
  pre:
    symbols:
      \b\w+\b:
        count: 7
        multiplier: 0.1
    score: 0
  code:
    symbols:
      \b\w+\b:
        count: 7
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.52
Sync with the GitHub metadata of the comment or issue
0.71
content:
  p:
    symbols:
      \b\w+\b:
        count: 10
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.71
```suggestion ```
0.1
content:
  pre:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.1
    score: 0
  code:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.1
```suggestion plaintext = null; ``&#…
0.64
content:
  pre:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.1
    score: 0
  code:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.1
    score: 1
  p:
    symbols:
      \b\w+\b:
        count: 6
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.64
```suggestion plaintext = null; ```
0.18
content:
  pre:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.1
    score: 0
  code:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.18
Should this file be deleted or something?
0.52
content:
  p:
    symbols:
      \b\w+\b:
        count: 7
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.52
Why does it have to be this exact length?
0.65
content:
  p:
    symbols:
      \b\w+\b:
        count: 9
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.65
```suggestion commentObject?: Record<string, un…
1.5
content:
  pre:
    symbols:
      \b\w+\b:
        count: 5
        multiplier: 0.1
    score: 0
  code:
    symbols:
      \b\w+\b:
        count: 5
        multiplier: 0.1
    score: 1
  p:
    symbols:
      \b\w+\b:
        count: 17
        multiplier: 0.1
    score: 1
multiplier: 1
1 1.5
Seems unnecessary. Just delete the trailing zeros.
0.52
content:
  p:
    symbols:
      \b\w+\b:
        count: 7
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.52
We can restore from git history if needed. Its not in use so it …
1.11
content:
  p:
    symbols:
      \b\w+\b:
        count: 17
        multiplier: 0.1
    score: 1
multiplier: 1
1 1.11
If you defined it, then its wrong. If its from another database …
1.54
content:
  p:
    symbols:
      \b\w+\b:
        count: 25
        multiplier: 0.1
    score: 1
multiplier: 1
1 1.54
Why is the default issue on the comments table? Why even add the…
1.38
content:
  p:
    symbols:
      \b\w+\b:
        count: 22
        multiplier: 0.1
    score: 1
multiplier: 1
1 1.38
This seems very error prone. Why dont you use some virtual DOM (…
1.46
content:
  p:
    symbols:
      \b\w+\b:
        count: 20
        multiplier: 0.1
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 2
        multiplier: 0.1
    score: 1
multiplier: 1
1 1.46
```suggestion model: "voyage-large-2-instru…
1.02
content:
  pre:
    symbols:
      \b\w+\b:
        count: 5
        multiplier: 0.1
    score: 0
  code:
    symbols:
      \b\w+\b:
        count: 5
        multiplier: 0.1
    score: 1
  p:
    symbols:
      \b\w+\b:
        count: 7
        multiplier: 0.1
    score: 1
  a:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.1
    score: 1
multiplier: 1
1 1.02
```suggestion const markdown = payload.issue.body…
1.92
content:
  pre:
    symbols:
      \b\w+\b:
        count: 9
        multiplier: 0.1
    score: 0
  code:
    symbols:
      \b\w+\b:
        count: 9
        multiplier: 0.1
    score: 1
  p:
    symbols:
      \b\w+\b:
        count: 20
        multiplier: 0.1
    score: 1
multiplier: 1
1 1.92
```suggestion logger.debug(`Exiting addIssue&…
1.71
content:
  pre:
    symbols:
      \b\w+\b:
        count: 4
        multiplier: 0.1
    score: 0
  code:
    symbols:
      \b\w+\b:
        count: 4
        multiplier: 0.1
    score: 1
  p:
    symbols:
      \b\w+\b:
        count: 22
        multiplier: 0.1
    score: 1
multiplier: 1
1 1.71
This seems out of scope?
0.39
content:
  p:
    symbols:
      \b\w+\b:
        count: 5
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.39
Still confused about this fill stuff
0.46
content:
  p:
    symbols:
      \b\w+\b:
        count: 6
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.46
Why did you switch to main from development that doesn't seem ri…
0.88
content:
  p:
    symbols:
      \b\w+\b:
        count: 13
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.88
You'll need to cherry pick changes (easy to do with a git UI) an…
4.58
content:
  p:
    symbols:
      \b\w+\b:
        count: 90
        multiplier: 0.1
    score: 1
multiplier: 1
1 4.58
Where's the issue body? You should probably make another table a…
1.75
content:
  p:
    symbols:
      \b\w+\b:
        count: 29
        multiplier: 0.1
    score: 1
multiplier: 1
1 1.75
I'm bad at deciding this sort of thing. Let's go with your sugge…
1.7
content:
  p:
    symbols:
      \b\w+\b:
        count: 28
        multiplier: 0.1
    score: 1
multiplier: 1
1 1.7
Seems mostly good but I didn't see all the headers on the first …
2.69
content:
  p:
    symbols:
      \b\w+\b:
        count: 48
        multiplier: 0.1
    score: 1
multiplier: 1
1 2.69
Yes I think its unnecessary if they are separated by type on dif…
0.94
content:
  p:
    symbols:
      \b\w+\b:
        count: 14
        multiplier: 0.1
    score: 1
multiplier: 1
1 0.94
Thanks for the thorough QA. You don't need to make a new video o…
7.49
content:
  p:
    symbols:
      \b\w+\b:
        count: 158
        multiplier: 0.1
    score: 1
  code:
    symbols:
      \b\w+\b:
        count: 1
        multiplier: 0.1
    score: 1
multiplier: 1
1 7.49
I think that only tier5 subscribers can use right now via API. I…
1.17
content:
  p:
    symbols:
      \b\w+\b:
        count: 18
        multiplier: 0.1
    score: 1
multiplier: 1
1 1.17
0x4007 commented 1 week ago

I'll need to top up the wallet soon