microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
100.72k stars 12.45k forks source link

GitHub Issue Duplicate Detector #59923

Closed 01x4-dev closed 1 month ago

01x4-dev commented 1 month ago

πŸ” Search Terms

"duplicate"

βœ… Viability Checklist

⭐ Suggestion

As there are lots of open issues (5k+ when writing this), the likelihood for the community of inadvertently introducing new duplicates is fairly high.

The only way for contributors to mitigate this risk is to follow the recommended guidelines and rigorously searching (manually) similar issues prior to open a new one.

Needless to say, this is error prone as it depends on the quality of the research itself.
Result: while everybody agrees on the rulebook, life is too short so let's submit new issue right away... πŸ€·πŸ»β€β™‚οΈ

The new GitHub action ghidd delivers a seamless repo maintainer experience aiming to simplify triage tasks by automatically spotting and tagging duplicate issues.

πŸ“‘ Read more at https://github.com/marketplace/actions/github-issue-duplicate-detector Contributions are highly welcome πŸ™πŸ»

πŸ“ƒ Motivating Example

This is a feature purely for repo maintainers, translating to a smoother community experience (e.g. less friction when contributing)

πŸ’» Use Cases

  1. What do you want to use this for? To let repo maintainers simplify their triage tasks by automatically spotting and tagging duplicate issues
  2. What shortcomings exist with current approaches? More time consuming and error prone results. More tedious contributor experience. Less accurate results leading to higher duplicate missed = tech debt and issue backlog rapidly increasing at scale.
  3. What workarounds are you using in the meantime? Rely solely on contributors to religiously follow your guidelines and make their research prior to submit a new issue...good luck with it πŸ˜‰
RyanCavanaugh commented 1 month ago

I read the code of the repo and this is not production ready. I don't think you've tried it on something with 5,000 open issues, let alone the 30,000 closed ones (duping to closed issues happens all the time). It's not scalable solution to live-fetch every single issue and re-ping GPT on each one hoping to find a duplicate.

01x4-dev commented 1 month ago

Thanks for the helpful comments, @RyanCavanaugh - duly noted ✌🏻 Closed issues should be out of context as GitHub API used only returns open issues. Nonetheless I agree the live-fetch should be improved. ghidd is still at early stage indeed, but I'm already working to support caching during the first all pass to learn the installed open issue base and avoid re-pinging GPT during new mode. Should help to speed up traversing list of open issues while saving tokens / cost.