njfritter / poc-data-pipelines

Proof-of-Concept (POC) Data Pipelines for various use cases such as data streaming/ingestion, batch data processing, orchestration and storage. Includes technologies such as Apache Airflow, Apache Spark, Apache Kafka, AWS, Python and more
0 stars 0 forks source link

Create Cassandra Catalog for Speed Layer Table #22

Open njfritter opened 7 months ago

njfritter commented 7 months ago

According to the docs for the PySpark Cassandra connector, creating a catalog allows for any DDL or modifications done in Spark to show up in the underlying Cassandra schema, tables and keyspace.

This could come in handy later and prevent having to recreate the table every time there is a schema change.

Edit: Additional info here.