Closed wholesomedev closed 7 years ago
Notes for later:
It seems to me that we'll have to use two separate logging services: Google Analytics (for long term logging), ELK logging (for monitoring and alerts).
I would have preferred we didn't use two separate services, but the reason for this is that logging service providers charge much higher for longer data retention, and the longest they'll retain, anyway, is 365 days.
I think for monitoring and alerts, we shouldn't need more than 30 days data retention. For long term logging of things such as page-views, we can use Google Analytics.
I've looked at several providers. The ELK providers are logz.io, logit.io, and Logsene (Sematext.com). There's also Loggly and Sumo Logic but those are proprietary logging solutions, so I don't think we should use them. logz.io, logit.io, and Logsene are the most managed. Followed by Elastic.io and AWS in order of management. We need the most managed solution so that we don't tie up our own time on infrastructure work.
logz.io, logit.io, and Logsene are comparable in features and cost. Though Logsene is a bit more expensive. However, Logsene seems to be the most established and mature solution. And their sales page was the most convincing to me.
I think we should go with Logsene for monitoring and alerts, and Google Analytics for long term logging.
I spent some time this week to learn about the ELK stack. It's still unclear to me where this fits in our stack. As far as I understand, we'll need to have server-side logic to send analytics data to Logsene. What types of data should we track in Logsene? Should we use something like Sentry or LogRocket instead of/in conjunction with the ELK stack to track client-side events?
@forabi The idea is that we want to collect meaningful telemetry so that we get insight into how our users are experiencing our application. We need to see what actions our users are taking, how they are using the app, how well the app is performing in terms of speed and quality, etc. One way to do this is to log actions users take, log performance related data, and log errors.
Basically here's what we want: from both the server and the client, we will be dispatching events that will go to a data-store on which we will be able to run queries and view reports on a dashboard.
Consider the following use cases:
Say when user with IP 1.160.10.240 hits the proxy server, we send the this payload to our log server:
{
"ip": "1.160.10.240",
"event": "PAGE_REQUESTED",
"data": {
"timestamp": 12283237,
"url": "/Tom_Hanks"
}
}
Then from the client once the page has rendered, we can log the event:
{
"ip": "1.160.10.240",
"event": "PAGE_RENDERED",
"data": {
"timestamp": 12983237,
"url": "/Tom_Hanks"
}
}
With this information we can run a report where we get the timestamp difference between the two events to know how long it really is taking our users from the time they hit the proxy to the time the page has fully rendered.
We can also use the logging service to log crashes.
We can use it to log events like how many times our users are performing a search action, viewing pages, contributing content, etc. We can then decide on what our key metrics are and when we push changes to the website, we can monitor whether our key metrics are improving or worsening.
Say one of our key metrics is the "SUMMARY_ADDED" event which we log when a user contributes a new summary. If after we push a certain code change to the website, we see that the "SUMMARY_ADDED" event has dipped, we can look into what happened. Maybe the submit button handler broke, the API broke or something.
I've used Sentry before and it's pretty useful. I haven't used LogRocket. There's also Bugsnag. Each one of these seems to be focused on one task, like reporting crashes, monitoring users, or something. But my guess is that none of them alone is going to give us the flexibility of plain ELK where we can log whatever we want and query our data in any way we want. I don't think we should use multiple logging solutions unless we have a really good reason for that. That's why I'm thinking maybe we should start with just an ELK solution and Google Analytics for now.
I tried Logsene with a demo account today. The web interface and the service in general does not seem "polished", I encountered a few bugs when signing in. If we decide to move away from Logsene in the future, we need to know if the logs can be exported and imported to another service.
It looks like the integration will be pretty simple though. We can use winston-logsene to log on the server. On the client side we might need to send our log events to our server and let the server forward to Logsene due to CORS policy.
Ah, thanks for checking them out for us. It's unfortunate that you think their service is not polished. Do you mind checking out the others, even Sumo Logic? After I've read what you said (and because somebody from Coursera told me that they use Sumo Logic at Coursera) I'm beginning to lean more towards Sumo Logic.
Splunk is the gold standard when it comes to this type of thing, but Splunk is too expensive for us.
Do you mind checking out the others, even Sumo Logic
I checked Sumo Logic. As already mentioned above, it is not based on ELK, but it looks like an excellent service and integration with Node.js and Docker seems possible. Are we okay with proprietary solutions?
I also checked the other ELK providers.
Pros:
Cons:
As for the other providers mentioned here, this table summarizes the pros and cons of each one:
Based on cost and extensibility, Logz.io seems like a good choice to me.
Whichever one we go with, switching to another provider shouldn't be hard since they all use ELK standard APIs (aside from Sumo Logic).
@msafih Let's go with Sumo Logic. We need to create an account with a company email (@hollowverse.com
). My personal @gmail
account was not accepted for a trial account.
@forabi I just created an account and added you as an admin. You'll probably receive an email from them.
ELK is a tech stack for sending and reading events and making dashboard and reports. We need that on Hollowverse.