ubiquity-os-marketplace / generate-vector-embeddings

0 stars 6 forks source link

feat: Issue Deduplication #11

Closed sshivaditya2019 closed 1 week ago

sshivaditya2019 commented 1 week ago

Resolves #6

github-actions[bot] commented 1 week ago

Unused types (1)

Filename types
src/adapters/supabase/helpers/issues.ts IssueType
sshivaditya2019 commented 1 week ago

@0x4007 I have tried to make a few examples, let me know if have more or want to test more.

https://github.com/user-attachments/assets/ac8c81dd-f70e-4a77-a86a-3cb752bad000

sshivaditya2019 commented 1 week ago
  • Adding labels is out of scope. Don't do that. Close it as unplanned, don't add any labels.
  • Add a match percentage as well when any are listed.

Added They will display the cosine similarity in percentage after each issue in the list.

  • How did you generate the test cases and determine their percentage similarity?

I manually calculated and created test cases using embeddings and found their cosine similarity values.

0x4007 commented 1 week ago

Can you link your issue where you tested so we can see the results?

sshivaditya2019 commented 1 week ago

Can you link your issue where you tested so we can see the results?

95%:

50%:

I have deployed the plugin at Plugin Link, if you wish to try it. The issues test values Link

0x4007 commented 1 week ago

Okay it seems like you aren't following the spec again.

Needs to list the similar results on every scenario.

Do 75% and 95% as a default.

sshivaditya2019 commented 1 week ago

Okay it seems like you aren't following the spec again.

Needs to list the similar results on every scenario.

Fixed that, it now returns the similar issue in both MATCH and WARNING case.

Do 75% and 95% as a default.

Warning Threshold is 75% now.

95%:

75%:

0x4007 commented 1 week ago

Fixed that, it now returns the similar issue in both MATCH

Doesn't look like it in the first one

sshivaditya2019 commented 1 week ago
  • First Comment

That's the first issue of that type, so its expected to not have similar issues. Two, should be the first time a similar issue, is found with similarity more than 95%.

So, the first issue would not satisfy the any of the match conditions. The third issue does not have any similar issues to that, so it wouldn't have any message.