Closed ReneTC closed 9 months ago
Trying with using meltano _run_op()
same issue
from dagster import repository, job
from dagster_meltano import meltano_resource, meltano_run_op
@job(resource_defs={"meltano": meltano_resource})
def meltano_run_job():
tap_done = meltano_run_op("-environment=prod tap-1 target-1")()
meltano_run_op("-environment=prod tap-2 target-2")(tap_done)
@repository()
def repository():
return [meltano_run_job]
gives same error
Seems to me this could be fixed by chaning the dagster name here
Just remove everything that is not in the regex ^[A-Za-z0-9_]+$
but make sure the executed command is not the same as the dagster name
This could either be fixed here: https://github.com/quantile-development/dagster-meltano/blob/1b3022cbd687c65ccd9288f767397efcd2e587ca/dagster_meltano/utils.py#L15-L19
By also replacing the =
.
But it might be easier to set the MELTANO_ENVIRONMENT
to prod
.
Would you like me to fix it, test it, and send a MR? (might first be done tomorrow).
For me the replacing of =
works best. But I am not sure of the direction you want to go as the package owner.
Would be great! I will see the PR appear.
Draft here: https://github.com/quantile-development/dagster-meltano/pull/45 I was not able to test it, I was confused how Meltano install this package.
I know you can add custom github urls (i.e my fork to test) to a package like so:
- name: dagster
variant: quantile-development
pip_url: dagster-ext git+https://github.com/my_fork.git
config:
repository_dir: ${MELTANO_PROJECT_ROOT}/orchestrate
But I I am not sure where to switch out the main package dagster-meltano with a custom git url
Okay after this is merged https://github.com/quantile-development/dagster-meltano/pull/47 it sadly does not work yet. If I have the prod task in meltano.yml
- name: task1
tasks:
- tap-spreadsheets-anywhere target-duckdb
- name: task1_prod
tasks:
- tap-spreadsheets-anywhere target-duckdb --environment=prod
When dagster-meltano runs, it will execute:
meltano run tap-spreadsheets-anywhere target-duckdb --environment=prod
but that is wrong it it returns the error:
Error: No such option: --environment
Correct syntax is meltano --environment=prod run tap-spreadsheets-anywhere target-duckdb
but I don't see how that is possible with the package here. I've asked in meltano slack how to execute a dagster run in another env here.
You should use the MELTANO_ENVIRONMENT
variable to specify which environment to use.
Thanks Jules but I don't see how to use MELTANO_ENVIRONMENT
in this example. Do you mind providing an example?
For example, we deploy Meltano using a Docker container. In the Docker container we set:
ENV MELTANO_ENVIRONMENT=prod
That way we run meltano in production in our production environment.
Thanks for your specific example @JulesHuisman I appreciate that. However, we are not using a docker container so that solution does not fix the issue.
I found one kinda-working-solution. If you run:
meltano --environment=prod invoke dagster:start
All of the jobs will be executed as prod. Not ideal, because if you want to run dagster as --environment=dev next time, the dagster logs does not distinguish and so exeucution time, number of fails and so on is very confusing to see in the dagster UI.
I wouldn't mark this as closed, at least for my case. Possible solutions for me, could be an meltano-dagster operator that also accepts env as input, i.e something like:
return meltano_command_op_with_env(
command=f"--environment={env} run {command} --force", dagster_name=dagster_name
)
But I am not sure it is the direction to go.
I am really sorry about me spamming this repo. I see a huge potential in it and I am already quite invested.
I've been running into a problem, I am interested if someone already solved it.
You can easily make a job run in dagster with the repo and putting this in the
meltano.yml
:However, if you need to run this in prod, you should add the flag
--environment=prod
, so:But running
meltano invoke dagster:start
Results in an error:dagster._core.errors.DagsterInvalidDefinitionError: "__environment=prod__tap_spreadsheets_anywhere_target_duckdb" is not a valid name in Dagster. Names must be in regex ^[A-Za-z0-9_]+$
Any ideas?