Sticking with our data-heavy backend use-cases, we should provide a convenient way to route data out into a scalable data lake backed by S3, a Glue Catalog and maybe Redshift Serverless.
S3 + Glue is a cheap and scalable data lake but it comes with the cost of maintaining partitions in S3, compaction, etc. A simple Firehose with Parquet transformation would be very useful and cost effective. It's straightforward to use this data as a starting point/source of truth to then load into any other data warehouse.
Redshift Serverless is especially compelling since cost of compute is decoupled from storage and the database does a lot for us in terms of maintaining data. We can take advantage of it to manage vacuum/compaction and it has managed storage backed by S3, so there doesn't seem to be many down sides? Should do a cost analysis.
What would the experience be? Could we use something like DrizzleKit ORM to model the relational schemas and then automatically create the tables in both Athena/Redshift? Or should we provide our own data model?
Sticking with our data-heavy backend use-cases, we should provide a convenient way to route data out into a scalable data lake backed by S3, a Glue Catalog and maybe Redshift Serverless.
S3 + Glue is a cheap and scalable data lake but it comes with the cost of maintaining partitions in S3, compaction, etc. A simple Firehose with Parquet transformation would be very useful and cost effective. It's straightforward to use this data as a starting point/source of truth to then load into any other data warehouse.
Redshift Serverless is especially compelling since cost of compute is decoupled from storage and the database does a lot for us in terms of maintaining data. We can take advantage of it to manage vacuum/compaction and it has managed storage backed by S3, so there doesn't seem to be many down sides? Should do a cost analysis.
What would the experience be? Could we use something like DrizzleKit ORM to model the relational schemas and then automatically create the tables in both Athena/Redshift? Or should we provide our own data model?
This code doesn't insert directly into the table - it will write to Kinesis Firehose.
Inspiration/integration opportunity: https://github.com/drizzle-team/drizzle-orm