nelsonic / github-scraper

🕷 🕸 crawl GitHub web pages for insights we can't GET from the API ... 💡
425 stars 96 forks source link

Parse Thousands ("1.1k") in Social Metrics #108

Closed nelsonic closed 4 years ago

nelsonic commented 4 years ago

Currently the stats for a repo don't work beyond a thousand: https://github.com/nelsonic/github-scraper/blob/1d943809bb0565f1c84d416949d1c59ea7c58ebc/lib/repo.js#L23-L28

This is a really good problem to have. e.g: https://github.com/dwyl/start-here

dwyl-start-here-1 2k
SimonLab commented 4 years ago

I had a quick look at how to update the regex but I think it might be complicated to do it in just one. I'm thinking to:

var stringNumber = "4.3k";
stringNumber
.replace(/\.(\d)k$/, "$100") // $1 match the digit \d
.replace(/k$/, "000")
.replace(/[^0-9]/g, '')

I think this should work however I'm not sure this is the best or nicest solution. Moreover this will work for just "k", are there any repositories with more that a million stars "M"?

nelsonic commented 4 years ago

@SimonLab thanks for these lines! 👨‍💻

As far as I'm aware, the most popular repo on GitHub is currently 310k: https://github.com/freeCodeCamp/freeCodeCamp most-pop-repo

Not expecting them to reach 1M any time soon. ⏳ But ... I've included a couple of test cases and lines in the parse_int/1 function to handle the case. 👍

nelsonic commented 4 years ago

This is included in PR #109 🚀