yegor256 / cam

Classes and Metriсs (CaM): a dataset of Java classes from public open-source GitHub repositories
http://cam.yegor256.com
MIT License
23 stars 32 forks source link

feat(#227): samples-filter integration #306

Closed h1alexbel closed 4 months ago

h1alexbel commented 4 months ago

@yegor256, take a look, please

I've developed samples-filter command-line tool for filtering repositories.csv. We support both models: ML model based on Random-Forest algorithm, and Transformer model. We trained them on dataset of descriptions and READMEs of public GitHub repositories. In this pr, I've introduced integration with that tool by using transformer model. Let's see how it will perform on filtering by repos description.

closes #227

yegor256 commented 4 months ago

@h1alexbel excellent work!

yegor256 commented 4 months ago

@h1alexbel would be nice to add a paragraph about it, into tex/report.tex

h1alexbel commented 4 months ago

@yegor256 created #307 for this