h1alexbel / sr-detection

Identifying GitHub "sample repositories" (SR), that mostly contain educational or demonstration materials supposed to be copied instead of reused as a dependency
MIT License
6 stars 0 forks source link

list of filtered repos #168

Open h1alexbel opened 1 month ago

h1alexbel commented 1 month ago

Let's upload a new file in the collect.yml: removed.txt with all repositories that were removed during collection. In the format of $repo ($step). Consider this example:

foo/foo (filter)
foo/foo (maven)
foo/xyz (extract)
...
h1alexbel commented 1 week ago

Instead of the format above, let's just aggregate information about how many repositories were skipped during each step. For this information, we can capture this info just from collect.log