Embedding already has been used on software library or source code (Alon et al. 2019, Theeten et al. 2019). But we need to introduce two novel type of embeddings: docker image embedding and docker image libraries embedding. We used Swivel (Shazeer et al. 2016) as our embedding algorithm because of its performances running on GPU and the quality of the output vector space.
For now I only embedded the library space and not the image space, though I will probably do it before my internship ends to compare the two methods.
To represent the library space -which is of dimension 300-, I used the t-SNE method for dimmensionality reduction
The green dots are the node packages, the red dots are the native packages, the blue ones are python packages. The radius is proportional to the logarithm of used disk space.
Embedding already has been used on software library or source code (Alon et al. 2019, Theeten et al. 2019). But we need to introduce two novel type of embeddings: docker image embedding and docker image libraries embedding. We used Swivel (Shazeer et al. 2016) as our embedding algorithm because of its performances running on GPU and the quality of the output vector space.
For now I only embedded the library space and not the image space, though I will probably do it before my internship ends to compare the two methods. To represent the library space -which is of dimension 300-, I used the t-SNE method for dimmensionality reduction
The green dots are the node packages, the red dots are the native packages, the blue ones are python packages. The radius is proportional to the logarithm of used disk space.