mara / mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
MIT License
2.07k stars 100 forks source link

fix ReadFile with BigQuery when using gcs bucket #50

Closed ghost closed 3 years ago

ghost commented 4 years ago

The current ReadFile implementation will not work when using parameter gcloud_gcs_bucket_name in BigQueryDB config:

When parameter gcloud_gcs_bucket_name is used, another command is returned see here.

ghost commented 3 years ago

@gathineou Can you check if you could remove the custom BigQuery code in ReadFile and move it to mara_db.shell.copy_from_stdin_command with /dev/stdin as file, like it is done in https://github.com/mara/mara-db/pull/41? If yes, this PR can be closed.

Here a idea for the solution:

@copy_from_stdin_command.register(dbs.BigQueryDB)
def __(db: dbs.BigQueryDB, ...):
    if not db.gcloud_gcs_bucket_name`:
        return 'bq load /dev/stdin ....'

    # current implementation
martin-loetzsch commented 3 years ago

@hz-lschick In https://github.com/mara/mara-db/pull/45 there is a copy_from_stdin_command for BigQuery. Would that solve your issue?

ghost commented 3 years ago

@martin-loetzsch Yes! Have this now running several month in production without any issue.