ploomber / sql

https://ploomber-sql.readthedocs.io
Apache License 2.0
16 stars 5 forks source link

Introduction to the ETL process with Python and SQL #72

Closed lfunderburk closed 11 months ago

lfunderburk commented 12 months ago

Starter doc https://ploomber-sql.readthedocs.io/en/latest/packaging-your-sql-project/intro-to-etl-pipelines-with-python-and-sql.html

This blog comes after #70 so you can assume the user will have familiarity with scripting

The goal is then to talk about the ETL process:

  1. What are ETL pipelines
  2. Why are they useful
  3. How to implement an ETL pipeline with Python, pandas and SQLAlchemy
  4. Describe the functionality of the current process and how it exemplifies an ETL pipeline (use https://github.com/ploomber/sql/issues/66)
jpjon commented 12 months ago

My perspective is that this section will be very high level and it will not yet begin the ETL process for our data in #66

Rather, this section is just to:

  1. Define ETL's and describe their functionality
  2. An introduction to concepts such as requests, pandas, and SQLAlchemy. This intro will not have any explicit code and will just describe the workflow of each and how they relate to ETL
  3. How our process with #66 already exhibits the nature of ETLs

How does this sound to you? I just want to be clear if I am teaching users how to actually set up the ETL pipeline with code or to just provide a baseline understanding as described above.

lfunderburk commented 12 months ago

Thank you for sharing this perspective. I'd aim for somewhere in the middle.

Yes, we want to introduce the ETL process. Introducing the packages sounds like a good idea.

For the last point, perhaps somewhere we can meet in the middle is identifying key steps that are taken in #66 that reflect the ETL process to help the reader connect ETL with what is going on within that script.