pingcap / ossinsight

Analysis, Comparison, Trends, Rankings of Open Source Software, you can also get insight from more than 7 billion with natural language (powered by OpenAI). Follow us on Twitter: https://twitter.com/ossinsight
https://ossinsight.io/
Apache License 2.0
1.75k stars 331 forks source link

🚀 We are going to redesign the trending algorithm #778

Closed Icemap closed 1 year ago

Icemap commented 2 years ago

The developers of OSS Insight are loyal users of GitHub Trending. When we heard that GitHub was deprecating its Trending page, we decided to optimize OSS Insight trending to become a GitHub Trending alternative.

As we all know, most of the repos appearing on GitHub Trending are worthy of attention, but a few repos can also appear on the page by taking advantage of the trending algorithm. So the trending algorithm is very important. We are going to design a new algorithm that will be able to find the most popular repos, but also prevent some projects from getting onto the trending page through cheating.

Currently, we can provide these metrics, including GitHub interface interaction metrics like:

and code collaboration metrics like:

How should we set the weights of these metrics? Anybody got any ideas? Welcome to join us to discuss!

Icemap commented 2 years ago

I have a preliminary idea. Make a time sink algorithm using the number of Stars and the number of Forks. That is, set the upper and lower score limits, the longer the operation is from the current time, the lower the score until the lower score limit. According to this algorithm, count the scores in a certain time period of all repos and rank them. Then you can get the trending repos during this period.

guoqiangqi commented 1 year ago

I have a preliminary idea. Make a time sink algorithm using the number of Stars and the number of Forks. That is, set the upper and lower score limits, the longer the operation is from the current time, the lower the score until the lower score limit. According to this algorithm, count the scores in a certain time period of all repos and rank them. Then you can get the trending repos during this period.

Hi @Icemap , im so interesting in the OSS Insight trending algorithm you noticed and used by ossinsight.io website, can you show some details with formula or codes here? Really appreciate it!

Icemap commented 1 year ago

@guoqiangqi Sure. I'm very glad to help you. We just use one SQL to achieve it. It's quite simple in TiDB. Because TiDB is an HTAP database. So we can just use SQL to make the OLAP workflow. And this is the SQL file. If you have any questions, please feel free to comment here.

guoqiangqi commented 1 year ago

@guoqiangqi Sure. I'm very glad to help you. We just use one SQL to achieve it. It's quite simple in TiDB. Because TiDB is an HTAP database. So we can just use SQL to make the OLAP workflow. And this is the SQL file. If you have any questions, please feel free to comment here.

@Icemap Get it, thanks you so much.

zpointS commented 8 months ago

Hi @Icemap, I'm also interested in the design of trending algorithm, and I've found that the aforementioned SQL file has been moved away (or deprecated), so I wonder is there any other way that you can tell us about the formula or codes? Again really appreciate it!

Icemap commented 8 months ago

Hi @zpointS. Thanks for the like. We moved this SQL to here.