Closed alexsoble closed 5 months ago
Unlikely to be needed for ATO, possibly an issue for Tribal MVP
@abottoms-coder What kind of errors would you need for suer submissions in V1,V2 and or V3? We're not sure where to put this. It needs scoping.
@amilash Indeed, this needs scoping. At the present moment, I don't think we can accurately capture which errors/alerts are relevant given that we probably haven't even written the code that would error out.
Above and beyond specifics, I don't believe we have an alerting system in place. We'd need to implement an e-mail relay within our buildpack setups to send these alerts from something like system@tdp.cloud.gov. Ideally, I think the e-mail system should e-mail anyone with role sysadmin. Beyond these minor thoughts, I think we'd need a brainstorming session or two to come up with the paradigm used much less error-by-error.
For v1, I jotted down some thoughts as follows:
Im going to slate this for V3 since thats when we will have users.
re: users --
I think can be considered as part of the larger epic we are planning for which is "TDP Automated Communications/Notifications. I'll link it to our board.
Solutions:
https://cloud.gov/docs/ops/repos/#repositories
In this document, Cloud.gov lists 'New Relic' as one of the supported 'BOSH releases'
'Monitoring' is also listed under 'Deployment pipelines', which links to this promethius deployment
Per @stevenino should just hone in on PLG stack and pivot if issues arise.
from cloud.gov re: monitoring service instances
Currently cloud.gov customers do not have direct access to the logs for their service instances (RDS, ElasticSearch, etc) however we understand this is a requested customer feature that is on our roadmap. The current route for customers to obtain access to their service instance logs, if logs are enabled for that specific service instance (in this case your elasticsearch instance), is to send a request to support@cloud.gov for the logs for your specific service instance.
I asked for clarification on this point
Also, to clarify, “currently, cloud.gov customers do not have direct access to the logs for their service instances” – would this include if we configured a monitoring service, like promethius, inside the deployment space?
Response:
Currently customer service instance logs (RDS, Elasticsearch, etc) are not exposed to customers or the customer deployment space, as such any monitoring service would not have access to your service instance logs.
seems we would be limited to only application logs.
Nearing completion - will provide team with documentation and table top to showcase discovery (5/29).
Waiting on Data Dog for presentation.
SENTRY self hosted requirement are mentioned here: https://develop.sentry.dev/self-hosted/
It is mentioned:
2 CPU cores 4 GB RAM
For Prometheus, I used this installation manual: https://github.com/korfuri/django-prometheus?tab=readme-ov-file We had to install it locally. One difference from Logstash and Sentry is: Prometheus pulls the data from the server instead of pushing from server to prometheus. This might need more attention on the security since we will have to open up a port on the monitored app for the Prometheus to be able to see the logs endpoint.
Will wrap up next week w/ DataDog demo Tuesday. Mo also has done great work on multiple proof of concepts. With all these in, we will discuss path forward as a team with pros/cons, etc during office hours or a one-off meeting.
DataDog meeting is now Thursday.
For comparison see this
Work is complete, need to decide course of action for the future.
A nice video explaining SENTRY capabilities: https://youtu.be/4djseRVSan8?si=KlElkQQN_7zwoaEj
Deemed closed, spin off tickets are being generated.
Notes
Research: