src-d / ml-backlog

Issues belonging to source{d}'s Machine Learning team which cannot be related to a specific repository.
0 stars 3 forks source link

Predict Docker Image Size given lib embedding #85

Closed glimow closed 5 years ago

glimow commented 5 years ago

I used a simple keras MLP with two hidden layers of 100 neurons and a mean squared error loss.

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 100)               30100     
_________________________________________________________________
activation_1 (Activation)    (None, 100)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 100)               10100     
_________________________________________________________________
activation_2 (Activation)    (None, 100)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 101       
=================================================================
Total params: 40,301
Trainable params: 40,301
Non-trainable params: 0
_________________________________________________________________

The neural network quickly converges and gives an estimator that is ~3 times better than taking the mean size of docker images. This is using a subset of 5% of the final dataset.

vmarkovtsev commented 5 years ago

Tristan's internship has ended. The artifact is https://github.com/src-d/docker-image-analysis which is going to become public.