Closed mgechev closed 5 years ago
The idea is brilliant, but is your predictive model only based on the URLs of the sites? Some papers have pointed out that the performance bottleneck is actually sub-resource loading within one single request, such as images, js files, etc.
Some works that might be relevant: "Why are web browsers slow on smartphones?", 2011 "How far can client-only solutions go for mobile browser speed?", 2012 "Speeding up Web Page Loads with Shandian.", 2016 "Polaris: Faster Page Loads Using Fine-grained Dependency Tracking", 2016 "Crom: Faster Web Browsing Using Speculative Execution.", 2010
Thanks for sharing all these resources!
Based on the report from Google Analytics, which provides mostly visits & transitions per URLs, we create a fine-grained mapping to individual resources by performing static analysis. Our first target is JavaScript, because it's expensive. On later stages we'll expand this to CSS, images, and other assets.
Current approach
At the moment, Guess.js uses a Markov chain in order to predict the next route the user will navigate to. We build the Markov chain by using a report fetched from Google Analytics (GA) where for each page path, we get the previous page path. The model has several advantages such as:
This approach has its own cons. We ignore a lot of potentially useful features such as:
navigator.locale
&navigator.platform
Improving accuracy
We're thinking of exploring a more advanced model using neural networks. We've been looking at LSTM using tensorflow.js. Currently, there are few unknowns we need to research further, such as:
Additional questions
The problem that we're solving looks quite similar to a recommender system and the path we've taken is collaborative filtering. Is it worth exploring content-based filtering or a mixture between the two?