transitmatters / data-ingestion

Crontab for data ingestion/processing on AWS Lambda
MIT License
4 stars 3 forks source link

Ingest LAMP Alerts #101

Open devinmatte opened 6 months ago

devinmatte commented 6 months ago

https://performancedata.mbta.com/lamp/tableau/alerts/LAMP_RT_ALERTS.parquet

This file should be processed in a daily job (its 84mbs) to split the alerts into daily files/dynamo tables

Once processed, the dashboard can pull alerts from here, and fall back to v3 alerts for same day

We still have v2 alerts we ingested to fall back on until May 2024 if we need to, but ideally this removes the need for them

maxtkc commented 1 month ago

Is the goal to have both daily files and dynamo, or just dynamo tables? For dynamo, is it just be one table with items that are each row in the parquet table? Do we want to keep all of the columns?

For reference, these are the columns:

cause
cause_detail
effect
effect_detail
severity_level
severity
alert_lifecycle
duration_certainty
header_text.translation.text
description_text.translation.text
service_effect_text.translation.text
timeframe_text.translation.text
recurrence_text.translation.text
created_datetime
created_timestamp
last_modified_datetime
last_modified_timestamp
last_push_notification_datetime
last_push_notification_timestamp
closed_datetime
closed_timestamp
active_period.start_datetime
active_period.start_timestamp
active_period.end_datetime
active_period.end_timestamp
informed_entity.route_id
informed_entity.route_type
informed_entity.direction_id
informed_entity.stop_id
informed_entity.facility_id
informed_entity.activities

I think dynamo batch write is capped at 16 MB, so we may need to only batch write the past two days for the daily job and then separately write the older alerts