openshiftio / openshift.io

Red Hat OpenShift.io is an end-to-end development environment for planning, building and deploying modern applications.
https://openshift.io
97 stars 66 forks source link

Generate stack recommendations using a probabilistic approach with Edward #1802

Closed rootAvish closed 6 years ago

rootAvish commented 6 years ago

User Story

As an OSIO/Fabric8-analytics IDE extension user I should be able to get companion/outlier insights for my stack via the new approach.

Acceptance Criteria

A working POC with multiple probabilistic approaches implemented (using Edward) and tested. The results/requirements/shortcomings of each for the NPM ecosystem should be documented at the end of this spike.

Description

The current approach uses a Bayesian network(hierarchical Bayesian inference) built on top of pomegranate. It works using exact inference and we want to try out some approximate inference methods in its place. In addition to the same using Edward, we want to leverage the power of the deep learning libraries that Edward is written on(Tensorflow, Keras) to try out a neural networks based approach that is currently SOTA.

Task List

Using the existing data available for NPM:
Collection of "big data" and re-training of models:
rootAvish commented 6 years ago

It was clear in the initial stages of the POC itself that simply using a different library will not scale the approach to large ecosystems such as NPM, as a result we came up with a bunch of different deep learning approaches to solve the problem but finally settled on two parallel paths: The use of autoencoder based approaches and a matrix factorization approach (HPF) to solve the problem for NPM.

The findings are documented here: https://docs.google.com/document/d/1LTAJRq60lNs-fDGsZLkLS0r8E8j5zDxK_7zkXJqLRxM/edit

Based on the above document we'll be creating new issues that outline the steps required to complete this POC, so closing the issue instead of moving it to the new sprint.

rootAvish commented 6 years ago

/cc @sara-02 @krishnapaparaju @sivaavkd