The statistic features in the application might end up having issues when the number of articles $n$ grows large, slowing down the article statistics views. Here are some possible things to consider to speed this up:
The domain and time series statistics are currently set to one API endpoint. Refactor this into 3 separate endpoints in server/src/routes.py and the corresponding DB functions in server/src/views/data_analysis/stats_analyzer.py.
Implement some level of cache to not fetch statistics if the articles table has not been updated.
The mapping from all news to individual word appearances is currently fully done in front end, and is $\mathcal O(n)$ with similar number of access times to the frequency data structure. It might be faster to use, e.g., Python's pandas.Series.value_counts in back end and then just send the json over.
The statistic features in the application might end up having issues when the number of articles $n$ grows large, slowing down the article statistics views. Here are some possible things to consider to speed this up:
The domain and time series statistics are currently set to one API endpoint. Refactor this into 3 separate endpoints in
server/src/routes.py
and the corresponding DB functions inserver/src/views/data_analysis/stats_analyzer.py
.Implement some level of cache to not fetch statistics if the articles table has not been updated.
The mapping from all news to individual word appearances is currently fully done in front end, and is $\mathcal O(n)$ with similar number of access times to the frequency data structure. It might be faster to use, e.g., Python's
pandas.Series.value_counts
in back end and then just send the json over.