meltano / meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
https://meltano.com/
MIT License
1.86k stars 166 forks source link

`meltano invoke` should not always run `--discover` prior to running the provided arguments #2807

Closed MeltyBot closed 2 years ago

MeltyBot commented 3 years ago

Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/2866

Originally created by @jlloyd3 on 2021-07-23 17:24:47


This is taken from the following slack message https://meltano.slack.com/archives/C01TCRBBJD7/p1627054578178200

Josh Lloyd: what is the expected behavior of meltano invoke? I’m trying to run a very specific command for a custom tap but the command seems to be ignoring the arguments I’m trying to pass to the tap. Specifically I’m trying to run meltano invoke tap-rest-api --infer_schema but the first line in the error message is

Catalog discovery failed: command ['/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/bin/tap-rest-api', '--config', '/Users/.../meltano/.meltano/run/tap-rest-api/tap.config.json', '--discover'] returned 1: INFO Loading Schemas

which makes me think that it’s ignoring the argument --infer_schema entirely because I don’t see it in the command it’s spitting back at me. It seems to just be running --discover instead.

Douwe Maan: @Josh Lloyd meltano invoke with a tap will always call the tap in discovery mode first if the discovery capability is present, except when meltano invoke <tap> --discovery is run directly, because discovery/catalog is needed for sync mode (without any -- args). I think the better behavior would be to only run discovery mode ahead of the regular requested invocation if it’s actually running in sync mode, i.e. not getting any user-provided arguments. That would require changing https://gitlab.com/meltano/meltano/-/blob/master/src/meltano/core/plugin/singer/tap.py#L213

Douwe Maan: So to be clear, it’s not running --discover instead of --infer_schema , it’s running it before running --infer_schema, and if --discover fails, it won’t get to --infer_schema

Douwe Maan: The easiest way to work around this would be to either make sure --discover doesn’t fail (which may be impossible if that depends on having inferred the schema first?), removing the discovery capability (temporarily), or fixing that check in. I agree this is undesireable behavior ...

MeltyBot commented 2 years ago

View 7 previous comments from the original issue on GitLab