raft-tech / TANF-app

Repo for development of a new TANF Data Reporting System
Other
17 stars 4 forks source link

[Spike] As a Tech Lead, I want to get alerts when there is a backend or frontend error that affects an STT user #831

Closed alexsoble closed 5 months ago

alexsoble commented 3 years ago

Notes

Research:

alexsoble commented 3 years ago

Unlikely to be needed for ATO, possibly an issue for Tribal MVP

amilash commented 3 years ago

@abottoms-coder What kind of errors would you need for suer submissions in V1,V2 and or V3? We're not sure where to put this. It needs scoping.

andrew-jameson commented 3 years ago

@amilash Indeed, this needs scoping. At the present moment, I don't think we can accurately capture which errors/alerts are relevant given that we probably haven't even written the code that would error out.

Above and beyond specifics, I don't believe we have an alerting system in place. We'd need to implement an e-mail relay within our buildpack setups to send these alerts from something like system@tdp.cloud.gov. Ideally, I think the e-mail system should e-mail anyone with role sysadmin. Beyond these minor thoughts, I think we'd need a brainstorming session or two to come up with the paradigm used much less error-by-error.

For v1, I jotted down some thoughts as follows:

amilash commented 3 years ago

Im going to slate this for V3 since thats when we will have users.

ADPennington commented 3 years ago

re: users --

amilash commented 3 years ago

I think can be considered as part of the larger epic we are planning for which is "TDP Automated Communications/Notifications. I'll link it to our board.

andrew-jameson commented 2 years ago

Solutions:

jtimpe commented 8 months ago

https://cloud.gov/docs/ops/repos/#repositories

In this document, Cloud.gov lists 'New Relic' as one of the supported 'BOSH releases'

'Monitoring' is also listed under 'Deployment pipelines', which links to this promethius deployment

andrew-jameson commented 8 months ago

Per @stevenino should just hone in on PLG stack and pivot if issues arise.

jtimpe commented 8 months ago

from cloud.gov re: monitoring service instances

Currently cloud.gov customers do not have direct access to the logs for their service instances (RDS, ElasticSearch, etc) however we understand this is a requested customer feature that is on our roadmap. The current route for customers to obtain access to their service instance logs, if logs are enabled for that specific service instance (in this case your elasticsearch instance), is to send a request to support@cloud.gov for the logs for your specific service instance.

I asked for clarification on this point

Also, to clarify, “currently, cloud.gov customers do not have direct access to the logs for their service instances” – would this include if we configured a monitoring service, like promethius, inside the deployment space?

Response:

Currently customer service instance logs (RDS, Elasticsearch, etc) are not exposed to customers or the customer deployment space, as such any monitoring service would not have access to your service instance logs.

seems we would be limited to only application logs.

raftmsohani commented 6 months ago
robgendron commented 6 months ago

Nearing completion - will provide team with documentation and table top to showcase discovery (5/29).

robgendron commented 6 months ago

Waiting on Data Dog for presentation.

raftmsohani commented 6 months ago

SENTRY self hosted requirement are mentioned here: https://develop.sentry.dev/self-hosted/

It is mentioned:

2 CPU cores 4 GB RAM

raftmsohani commented 6 months ago

For Prometheus, I used this installation manual: https://github.com/korfuri/django-prometheus?tab=readme-ov-file We had to install it locally. One difference from Logstash and Sentry is: Prometheus pulls the data from the server instead of pushing from server to prometheus. This might need more attention on the security since we will have to open up a port on the monitored app for the Prometheus to be able to see the logs endpoint.

andrew-jameson commented 6 months ago

Will wrap up next week w/ DataDog demo Tuesday. Mo also has done great work on multiple proof of concepts. With all these in, we will discuss path forward as a team with pros/cons, etc during office hours or a one-off meeting.

robgendron commented 5 months ago

DataDog meeting is now Thursday.

raftmsohani commented 5 months ago

For comparison see this

robgendron commented 5 months ago

Work is complete, need to decide course of action for the future.

raftmsohani commented 5 months ago

A nice video explaining SENTRY capabilities: https://youtu.be/4djseRVSan8?si=KlElkQQN_7zwoaEj

robgendron commented 5 months ago

Deemed closed, spin off tickets are being generated.