Open sam-goodwin opened 10 months ago
from refinery import Stream, Table
class ClickEvent(BaseModel):
click_id: str
click_time: datetime
..
# create a data catalog for managing my company's databases, schemas and tables
company_catalog = DataCatalog(name="company_catalog")
# create a schema for one of the teams, the "retail website"
retail_website = my_catalog.add_schema(name="retail_website")
click_stream = company_catalog.add_stream[ClickEvent]("click_stream")
click_table = my_catalog.add_table[ClickEvent](
name="clicks",
partitioned_by="click_time",
)
click_stream.sink(click_table)
# sink the stream of clicks into the table
click_stream.sink(click_table)
# create a Redshift Database and include the click_table
click_redshift_db = RedshiftDB(
name="click_db",
tables=[
click_table
])
TODO: WIP
The first step in refining data is ingestion - we gotta get data from somewhere!
Data comes from many places:
Data needs to be updated:
Data needs to be organized:
Data will be consumed: