publicprep / datamart

import scripts and SQL reporting views for amazon redshift at public prep. definitely getting a cuter name soon.
0 stars 0 forks source link

Redshift vs RDS? #1

Open hunterowens opened 8 years ago

hunterowens commented 8 years ago

Curious why you're using Redshift? Data size that large?

almartin82 commented 8 years ago

Good q - Shared our internal decision making doc with you - give that a read.

Basically I want to be out of the database scripting ETL game, if possible. Obviously easier said than done, but the middleware providers seem to like redshift as a db backend.

http://data.donorschoose.org/all-data-in-one-place/ inspired me to kick the tires, and I liked what I saw - especially around dba tasks like backup and disaster recovery.

If you have redshift horror stories or concerns, let me know because we are still very much in evaluation mode.

hunterowens commented 8 years ago

Ok, the tl:dr; version

Redshift is a column based datastore that allows you to write queries in PGSQL, with a subset of Postgres operations, on a hosted platform.

RDS is a managed database server, on AWS, that allows you to select a variety of row based database providers (Postgres, MySQL, Oracle, MS SQL Server).

You should probably be deciding between Redshift and Postgres on RDS, and I believe the latter is the better choice.

Reasons:

1) Data Size Redshift, as a columnar store, is designed for large, thin, long tables. Think event logs. The use case I used is CTA boarding count, by stop for each bus. This means that queries on standard SIS data can actually take longer on Redshift than on Postgres (a row based store)

2) Latest Postgres features JSON/JSONB support, Bloom Filters, etc - the latest versions of Postgres (the 9.x series) have a lot to offer than version 8 doesn't. RDS postgres stays up to date while Redshift remains in the 8 series.

almartin82 commented 1 year ago

In hindsight, the answer is - we were impatiently waiting for the modern cloud data stack