Closed MeltyBot closed 1 month ago
Picking up this conversation after @edgarrmondragon pointed me to #2856 in Slack conversation.
Whilst providing an explicit catalog cache refresh via CLI (#2856) and via configuration setting (this issue) is a good medium term option, the important short-term enhancement is actually to address the point that @tayloramurphy pointed out, which is that the fact of a cached catalog file is currently not mentioned in the documentation.
I've had a quick look through the documentation, and the most sensible place to add a note about the cached catalog (and potentially a hint about using a tap reinstall as a workaround for the missing cache clear feature) seems to be the CLI reference documentation. e.g. in the select
section.
Running select
is how I had my first introduction to the catalog cache (using transferwise tap-postgres
). And if I had chosen to start reading the documentation in order to find out why my select
output does not match my list of source entities, I probably would have ended up reading the select
CLI reference documentation.
Here's some other ideas
@cwegener I made https://github.com/meltano/meltano/issues/6292 to track updating the documentation!
This could be interesting in the context of:
cc @aaronsteers
This would be useful to my team during development, as we're changing the imported schema a lot.
Finally figured out why this is such a frustrating use case for me and haven't articulated it.
When you set something like
- name: tap-name
inherit_from: tap-postgres
select:
- thissupertable.*
When the thissupertable
table gets new columns added the catalog never gets updated because Meltano says "the select statement hasn't changed" therefore everything is good to go.
Whenever I use *
in select I implicitly expect Meltano is going to check and update my catalog every run. When I don't use *
then I don't' expect it.
Good discussion on this in https://meltano.slack.com/archives/CKHP6G5V4/p1663205560565299
As I think more on this I think having some sort of mechanism to alert on catalog changes would be very beneficial. I'm in favor of a more near-term fix where we can enable users to specify something like refresh_catalog: true
but longer term I want to be more thoughtful on the workflows and what we do with the catalog. There's huge value in this metadata to teams and a lot we can do with it (also thinking for managed).
@tayloramurphy and @visch - from the discussion...
There seems to be a path forward with catalog_caching
being able to be declared as an extra
and being able to be disabled
with meltano config tap-something set _catalog_caching disabled
. One nice thing about not starting with a simple true/false, is that we could expand this in future to be meltano config tap-something set _catalog_caching '60 min'
to allow short-lived catalogs in the future.
As an initial boolean toggle though, I think we probably would want to use the true/false or enabled/disabled value to drive the following behaviors:
What do you think?
@aaronsteers that seems reasonable as a short-term fix. Long term I like the idea of meltano catalog
- I think we can drive huge value around helping with the catalog and alerting on diffs prior to run execution. This is basically the "data contract" of the Singer world...
Arguably done by #8580. Feel free to comment if something's missing.
Arguably done by #8580. Feel free to comment if something's missing.
The PR looks like it solves the problem to me, we'll implement this and come back if it doesnt' work! Thank you, this cleans up a number of things for us
Yeah, just so folks don't have to dig through the PR/docs the options in Meltano 3.5.0a1+ are:
use_cached_catalog: false
extra setting--refresh-catalog
option of meltano [run|el|elt|select]
Broad discussion here:
Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/2907
Originally created by @vischous on 2021-08-27 12:26:32
I want to have Meltano build a new catalog every run when running
meltano invoke tap-oracle
A key at the SingerPlugin level probably makes sense, maybe call it
fresh_catalog
default to false?More specific than #2850 as #2627 didn't solve what I'm after. What I really want is a way to manage and watch my catalog change over time (#2677 / #2805 ), but this issue will be an incremental improvement over where I"m at today.
Today I delete the catalog and cache key from
.meltano/run/tap-name/*