yegor256 / cam

Classes and Metriсs (CaM): a dataset of Java classes from public open-source GitHub repositories
http://cam.yegor256.com
MIT License
23 stars 32 forks source link

feat(#228): `files` in `steps/clone.sh` for percentile filter #352

Open h1alexbel opened 2 months ago

h1alexbel commented 2 months ago

@yegor256 take a look, please

In this pull request, I've introduced method files_in_repo for aggregating a number of files in the repository, as required to filter out repositories by number of files using percentiles.

ref #228

yegor256 commented 2 months ago

@h1alexbel are you planning to count the files before cloning the repo?

h1alexbel commented 2 months ago

@yegor256 yes, I planned to count them in the discovery phase. However, files count after cloning repository will make it more reasonable and remove the bottleneck with GitHub API in discovery-repos.rb.