src-d / datasets

source{d} datasets ("big code") for source code analysis and machine learning on source code
Other
322 stars 82 forks source link

[feature request] PGA language statistics #110

Closed EgorBu closed 5 years ago

EgorBu commented 5 years ago

Hello,

it could be a useful feature to provide language statistics per repository -> so people will have more advanced options for filtering repositories.

Example: Downloading of a repository with 1 line of JS code and 1000s in other languages could be not so useful for researchers who are focusing on JS.

vmarkovtsev commented 5 years ago

@EgorBu We already have the line info in the index file, WDYM?

EgorBu commented 5 years ago

I think at that time it was not included, but I don't remember all details right now