src-d / ml-backlog

Issues belonging to source{d}'s Machine Learning team which cannot be related to a specific repository.
0 stars 3 forks source link

Publish DockerHub dataset #93

Closed vmarkovtsev closed 5 years ago

vmarkovtsev commented 5 years ago

I finished analyzing DockerHub images after @glimow left.

I had to restart on 100k images which were not extracted during the first pass. That gave me +30k.

The packages directory needs to be compressed and packaged as a dataset.

vmarkovtsev commented 5 years ago

Link to the packages archive: https://drive.google.com/file/d/1IZ0CO-MqEWNWd3Ud3pDsX6t8tPslY0Xu

vmarkovtsev commented 5 years ago

https://github.com/src-d/datasets/pull/162

vmarkovtsev commented 5 years ago

Done