guess-js / guess

🔮 Libraries & tools for enabling Machine Learning driven user-experiences on the web
https://guess-js.github.io/
MIT License
7.07k stars 203 forks source link

Model for predictive analytics #54

Closed mgechev closed 5 years ago

mgechev commented 6 years ago

Current approach

At the moment, Guess.js uses a Markov chain in order to predict the next route the user will navigate to. We build the Markov chain by using a report fetched from Google Analytics (GA) where for each page path, we get the previous page path. The model has several advantages such as:

This approach has its own cons. We ignore a lot of potentially useful features such as:

Improving accuracy

We're thinking of exploring a more advanced model using neural networks. We've been looking at LSTM using tensorflow.js. Currently, there are few unknowns we need to research further, such as:

Additional questions

The problem that we're solving looks quite similar to a recommender system and the path we've taken is collaborative filtering. Is it worth exploring content-based filtering or a mixture between the two?

felicitia commented 6 years ago

The idea is brilliant, but is your predictive model only based on the URLs of the sites? Some papers have pointed out that the performance bottleneck is actually sub-resource loading within one single request, such as images, js files, etc.

Some works that might be relevant: "Why are web browsers slow on smartphones?", 2011 "How far can client-only solutions go for mobile browser speed?", 2012 "Speeding up Web Page Loads with Shandian.", 2016 "Polaris: Faster Page Loads Using Fine-grained Dependency Tracking", 2016 "Crom: Faster Web Browsing Using Speculative Execution.", 2010

mgechev commented 6 years ago

Thanks for sharing all these resources!

Based on the report from Google Analytics, which provides mostly visits & transitions per URLs, we create a fine-grained mapping to individual resources by performing static analysis. Our first target is JavaScript, because it's expensive. On later stages we'll expand this to CSS, images, and other assets.