instill-ai / instill-core

🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
https://www.instill.tech
Other
1.99k stars 85 forks source link

[Feature] [VDP] [Pipeline] Data movement tools #1023

Open chuang8511 opened 1 week ago

chuang8511 commented 1 week ago

Is There an Existing Issue for This?

Where do you intend to apply this feature?

Instill Core, Instill Cloud

Is your Proposal Related to a Problem?

Background

When there are multiple data sources in companies, the data engineers in the companies need to migrate data from a source to another source.

The data is scattered around in applications, it is time-consuming for a company to write several tools to collect the data from applications, such as Gmail / Slack / ….

Describe Your Proposed Solution

User stories

Story 1

Possible pipelines image

Concrete examples image

e.g. transaction data is not analysable, but weekly transaction amount & transaction count are.

Story 2

As a data engineer, he/ she wants to transform unstructured data into analysable data and load to another data source.

Possible pipelines image

Concrete example image

Highlight the Benefits

It can solve the problem in the real world.

Anything Else?

Possible components

Data components

RDBMS

NoSQL

Vector DB

Others

Application components

Reference tools

Milestones

  1. Read the current pipelines
  2. Design the pipeline according to user stories.
    • Please draw the concrete pipelines first to ask us review before delving into development.
    • Timeline: 5 working days
  3. Check which components we are missing according to the designed pipeline.
    • Please create the skeleton PR first for the incoming components
    • Timeline: 2~3 working days
  4. Connect those components.
    • Timeline: 10 working days
  5. Build the designed pipeline after you connect those components.
    • Timeline: 1 working day

Note

linear[bot] commented 1 week ago

INS-5024 [Feature] [VDP] [Pipeline] Data movement tools