Closed sivaavkd closed 6 years ago
I am not sure either the title and objective cover the broader goals of this work. Will create a EPIC to cover these objectives..
Closing this as it is cloned by #1802 , created a new one so there's no confusion around who to contact in case of any queries around the issue description(since Github currently does not indicate that I'm the one who's written it if the issue has not been created by me). Updated the same in #1738
User Story
As a Fabric8-analytics IDE user I should be able to get recommendations for the NPM ecosystem within the time frame committed in the SLA.
Acceptance Criteria
Confirmation about whether this approach is scalable and works for the node ecosystem, a working POC of the same.
Description
The current approach uses a Bayesian network(hierarchical Bayesian inference) built on top of pomegranate. The approach however is not scalable as-is and would not scale to support larger ecosystems such as NPM. The idea here is to keep the bayesian inference in place however try to use an approximate Bayesian estimate rather than exact inference while also utilizing the power of Tensorflow and related deep learning libraries(Keras etc.) that are currently SOTA.
Edward is a library for probabilistic modeling built on top of Tensorflow, and we are going to leverage to same to do a quick POC and see whether we are able to scale the Bayesian approach.
Provisional Task List
Using the existing data available for NPM:
With the availability of "big" data:
Each of the tasks related to implementation of the model consists of three phases: