jupyter-naas / awesome-notebooks

A powerful data & AI notebook templates catalog: prompts, plugins, models, workflow automation, analytics, code snippets - following the IMO framework to be searchable and reusable in any context.
https://naas.ai/search
BSD 3-Clause "New" or "Revised" License
2.53k stars 428 forks source link

FastAPI-ETL Solution for dataframes #1535

Open prashantydav opened 1 year ago

prashantydav commented 1 year ago

This template is a collection of ETL pipelines, from extracting data from various sources to Loading the REST API to the specific Application.

FlorentLvr commented 1 year ago

Hi @prashantydav! Thank you for creating this issue. I would like to know about this template, could you please more context? What problem is solving this template? Are you using any references from others websites? 🙏

prashantydav commented 1 year ago

This Jupyter template is an ETL solution that involves extracting data from multiple sources, pre-processing and transforming the data, and then sending it to an endpoint using FASTAPI.

To begin with, the template allows you to extract data from various sources, such as Google Drive, AWS S3, GCP bucket, and websites using web scrapers. This gives you the flexibility to choose the sources that are relevant to your project.

Once the data is extracted, the template provides you with tools to pre-process and transform the data into a specific format. The pre-processing and transformation pipeline steps help ensure that the data is in a consistent format that is easy to work with.

After the data has been pre-processed and transformed, it is sent to the end of the pipeline where it is converted into REST API using FastAPI.

This Jupyter template provides a comprehensive ETL solution that allows you to extract data from various sources, pre-process and transform it into a specific format, and send it to an endpoint using FastAPI. This template can be customized to suit your specific needs and requirements, making it a powerful tool for data integration and analysis.

References:

  1. Boilerplate: https://github.com/jupyter-naas/data-product-framework
  2. pipeline docs : https://docs.naas.ai/features/pipeline-beta
FlorentLvr commented 1 year ago

This Jupyter template is an ETL solution that involves extracting data from multiple sources, pre-processing and transforming the data, and then sending it to an endpoint using FASTAPI.

To begin with, the template allows you to extract data from various sources, such as Google Drive, AWS S3, GCP bucket, and websites using web scrapers. This gives you the flexibility to choose the sources that are relevant to your project.

Once the data is extracted, the template provides you with tools to pre-process and transform the data into a specific format. The pre-processing and transformation pipeline steps help ensure that the data is in a consistent format that is easy to work with.

After the data has been pre-processed and transformed, it is sent to the end of the pipeline where it is converted into REST API using FastAPI.

This Jupyter template provides a comprehensive ETL solution that allows you to extract data from various sources, pre-process and transform it into a specific format, and send it to an endpoint using FastAPI. This template can be customized to suit your specific needs and requirements, making it a powerful tool for data integration and analysis.

References: 1. Boilerplate: https://github.com/jupyter-naas/data-product-framework 2. pipeline docs : https://docs.naas.ai/features/pipeline-beta

Amazing! I can't wait to see the result :). Maybe, we will be able to create more than one template with what you are doing. Are you already working a branch ? Don't hesitate to create a PR. I will be happy to review your code 🙏

prashantydav commented 1 year ago

This Jupyter template is an ETL solution that involves extracting data from multiple sources, pre-processing and transforming the data, and then sending it to an endpoint using FASTAPI. To begin with, the template allows you to extract data from various sources, such as Google Drive, AWS S3, GCP bucket, and websites using web scrapers. This gives you the flexibility to choose the sources that are relevant to your project. Once the data is extracted, the template provides you with tools to pre-process and transform the data into a specific format. The pre-processing and transformation pipeline steps help ensure that the data is in a consistent format that is easy to work with. After the data has been pre-processed and transformed, it is sent to the end of the pipeline where it is converted into REST API using FastAPI. This Jupyter template provides a comprehensive ETL solution that allows you to extract data from various sources, pre-process and transform it into a specific format, and send it to an endpoint using FastAPI. This template can be customized to suit your specific needs and requirements, making it a powerful tool for data integration and analysis. References: 1. Boilerplate: https://github.com/jupyter-naas/data-product-framework 2. pipeline docs : https://docs.naas.ai/features/pipeline-beta

Amazing! I can't wait to see the result :). Maybe, we will be able to create more than one template with what you are doing. Are you already working a branch ? Don't hesitate to create a PR. I will be happy to review your code 🙏

Hey @FlorentLvr I am very excited to work in this project too. I have created a branch named "FastAPI-ETL_endpoints", will start working on the data ingestion pipeline today, will update you on Slack.

jravenel commented 1 year ago

Hey @prashantydav can you update here on what you shared with me in private? Maybe a short loom video would help

FlorentLvr commented 1 year ago

@prashantydav, hope you are doing well! Just checking in! Did you make some improvements? Please don't hesitate to ask if you need any help 🙏

prashantydav commented 1 year ago

Hi @FlorentLvr , Sorry for the late reply, I have Created the Data Ingestion file and will work on some Intermediate pipelines and the Final Pipeline. Actually I am Engaged in College Projects thatswhy I was not able to work on this.

FlorentLvr commented 1 year ago

Hi @FlorentLvr , Sorry for the late reply, I have Created the Data Ingestion file and will work on some Intermediate pipelines and the Final Pipeline. Actually I am Engaged in College Projects thatswhy I was not able to work on this.

@prashantydav , Thank you for your feedback! Hope your college project webt well. Sorry for the late reply as well, no worry for your contribution. Please let us know when you are going to work again on it. 🙏

jravenel commented 1 year ago

@prashantydav how are you? we are putting this back into the roadmap, do you think you can work on it before the end of this iteration ending 31st of May?

jravenel commented 1 year ago

@prashantydav kind reminder, let us know if you want someone else to take over.