Open denschub opened 3 years ago
Long comment, import things are bold.
As I worked on this, some changes to the plans have popped up.
Instead of updating the old stack and making it work again, I decided to pretty much start from scratch and re-build the enitre stack. The reasons:
The new solution is based on a collection of relatively independent components that handle the entire process in multiple steps. The new stack has a couple of nice things about it. It allows us to:
The code for this is done and available to the public, the stack is set up and running, and all issue data have been imported. This is currently pending a team-internal announcement, which I'll probably pre-record so that Guillaume and Kate have access to that as well.
A possible improvement here is to extend the indexer task to extract the URL from the reports data, to have both the full URL as its own field in the JSON, but also the base domain. This would allow some more advanced statistics that are hard to do based on full-text searches alone.
Al the other fields, like the Operating System or Browser Version are also available as labels, but since the indexing step is separate, we can make adjustments as we deem necessary.
The internal presentation is done. I also ported over the significant portions from Adams dashboard. Because the data is there anyway, we can adjust and expand this stuff at any time without having to "start from scratch".
I will not have time to work on more dashboards and more data-based answers this year. I partly blame the fact that this project started a bit later than I would have hoped - which was not in my control. But I'm still very happy with the state we're in now.
The initial plan had "Build read-only JSON endpoints" as an optional feature, but since I've re-built the whole stack from the ground up, this turned out to be a central feature of everything. The new stack basically downloads the JSON responses from the GitHub API, stores them on a disk, and then indexes those JSON files into ES. This means we have a full dump of all .json files for all web-bugs available in real-time on the disc, I can also just serve that over HTTP (which I'm doing), and have daily snapshots (which I'm also doing). This is a good insurance for business continuity, if GitHub decides to disable the repo again. :)
I also already have the first project based on the exposed ElasticSearch API: The Softvision team currently does something via manual work that we can completely automate using the data we have now. Unfortunately, this idea only came up this week, so I did not have a chance to work on that yet, but it's exciting nonetheless.
We currently have some blind spots in terms of data and analysis that prevent us from effectively answering a couple of questions we have. Luckily, the data is not too hard to gather, and pretty much all of the web-bug data has already been imported into an ElasticSearch, we just have to make that more usable.
To end up in a better state, we should
Update the software stack in use to recent versions.Investigate and fix the current authentication issue, or replace it with something simpler.Re-format the already existing data from all web-bugs into a format that allows us to query individual issue events, not just the issues themselves.When that's done, we can
Optionally, if there is time left