I've developed samples-filter command-line tool for filtering repositories.csv. We support both models: ML model based on Random-Forest algorithm, and Transformer model. We trained them on dataset of descriptions and READMEs of public GitHub repositories. In this pr, I've introduced integration with that tool by using transformer model. Let's see how it will perform on filtering by repos description.
@yegor256, take a look, please
I've developed samples-filter command-line tool for filtering
repositories.csv
. We support both models: ML model based on Random-Forest algorithm, and Transformer model. We trained them on dataset of descriptions and READMEs of public GitHub repositories. In this pr, I've introduced integration with that tool by usingtransformer
model. Let's see how it will perform on filtering by repos description.closes #227