openshiftio / openshift.io

Red Hat OpenShift.io is an end-to-end development environment for planning, building and deploying modern applications.
https://openshift.io
97 stars 66 forks source link

Implement Bayesian Neural network/ hierarchical Inference using Edward to scale the approach for NPM #1737

Closed sivaavkd closed 6 years ago

sivaavkd commented 6 years ago

User Story

As a Fabric8-analytics IDE user I should be able to get recommendations for the NPM ecosystem within the time frame committed in the SLA.

Acceptance Criteria

Confirmation about whether this approach is scalable and works for the node ecosystem, a working POC of the same.

Description

The current approach uses a Bayesian network(hierarchical Bayesian inference) built on top of pomegranate. The approach however is not scalable as-is and would not scale to support larger ecosystems such as NPM. The idea here is to keep the bayesian inference in place however try to use an approximate Bayesian estimate rather than exact inference while also utilizing the power of Tensorflow and related deep learning libraries(Keras etc.) that are currently SOTA.

Edward is a library for probabilistic modeling built on top of Tensorflow, and we are going to leverage to same to do a quick POC and see whether we are able to scale the Bayesian approach.

Provisional Task List

Using the existing data available for NPM:
With the availability of "big" data:
Each of the tasks related to implementation of the model consists of three phases:
krishnapaparaju commented 6 years ago

I am not sure either the title and objective cover the broader goals of this work. Will create a EPIC to cover these objectives..

rootAvish commented 6 years ago

Closing this as it is cloned by #1802 , created a new one so there's no confusion around who to contact in case of any queries around the issue description(since Github currently does not indicate that I'm the one who's written it if the issue has not been created by me). Updated the same in #1738