mara / mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
MIT License
2.07k stars 100 forks source link

Tables do not exist: data_integration_node_run #62

Open kadnan opened 3 years ago

kadnan commented 3 years ago

I am getting the error:

packages/mara_pipelines/logging/node_cost.py", line 41, in node_durations_and_run_times
    GROUP BY node_path;""", {'path': node.path(), 'level': len(node.path())})
psycopg2.errors.UndefinedTable: relation "data_integration_node_run" does not exist
LINE 10:         FROM data_integration_node_run node
                      ^

Running config code I only see the below. No tables were created:

Created database "postgresql+psycopg2://root@localhost/example_etl_mara
ghost commented 3 years ago

Did you run flask mara_db.migrate to make sure that the required tables are created?

When you are new to mara, I suggest you to try out project https://github.com/mara/mara-example-project-1 where these things are automatically done by the makefile

kadnan commented 3 years ago

@hz-lschick Where is it mentioned in the README? It means your README info about setup is misleading? I am just following what is given there.

ghost commented 3 years ago

@kadnan I am not sure what you are referring to.

In the https://github.com/mara/mara-example-project-1 project it is written in the install section that you should "hit make", which will execute the makefile. This will then execute flask mara_db.migrate, see here.

I would wish there would be a better documentation I could refer you to, but as of today, there isn't.

kadnan commented 3 years ago

@hz-lschick If you follow this README, you would not be able to run the hell world script at all. How can you make this script run without referring to the example project? I was misleading to this README.

ghost commented 3 years ago

Never tried that. Maybe @martin-loetzsch can help here

leo-schick commented 1 year ago

Hi @kadnan ,

since version 3.3.0, it is possible to run a pipeline without database. You will get a warning that the mara database is missing but the pipeline can be executed.

Here a example how to run a simple pipeline:

Run a simple pipeline

Set up a python virtual environment and enter into it:

python3 -m venv .venv
source .venv/bin/activate

Install the package:

pip install mara-pipelines>=3.3.0

Create a python file data_pipeline.py with the following content:

from mara_pipelines.commands.bash import RunBash
from mara_pipelines.pipelines import Pipeline, Task
from mara_pipelines.ui.cli import run_pipeline

pipeline = Pipeline(
    id='demo_pipeline',
    description="My demo pipeline")

pipeline.add(
    Task(id='ping_google',
         description="Checks if google is available. Requires `ping` to be installed.",
         commands=[
            RunBash("ping google.com -c 4")
         ]))

run_pipeline(pipeline)

Run the pipeline:

python data_pipeline.py

This worked for me quite well:

image

I hope I was able to help you getting started with mara 😃