yegor256 / cam

Classes and Metriсs (CaM): a dataset of Java classes from public open-source GitHub repositories
http://cam.yegor256.com
MIT License
25 stars 39 forks source link

feat(#228): `files` in `steps/clone.sh` for percentile filter #352

Open h1alexbel opened 4 months ago

h1alexbel commented 4 months ago

@yegor256 take a look, please

In this pull request, I've introduced method files_in_repo for aggregating a number of files in the repository, as required to filter out repositories by number of files using percentiles.

ref #228

yegor256 commented 4 months ago

@h1alexbel are you planning to count the files before cloning the repo?

h1alexbel commented 4 months ago

@yegor256 yes, I planned to count them in the discovery phase. However, files count after cloning repository will make it more reasonable and remove the bottleneck with GitHub API in discovery-repos.rb.