Closed paulkaefer closed 10 months ago
Thanks for reporting this @paulkaefer , though I'm not sure how you got to it. 😅 I've got a couple changes to bundle up together for 1.4.11 , but this should come out before the end of the week!
@paulkaefer I just pushed v1.4.11 , you should be able to take it with cs_tools self upgrade
. Let me know if that fixes your error!
❗ It's possible that you might run into an issue with cs_tools self upgrade
as I had a bug in the installer itself. If your upgrade command fails, please go to /Users/paulkaefer/Library/Application Support/cs_tools/.cs_tools
(the directory with the dot in front of it) and remove it all (cd /Users/paulkaefer/Library/Application\ Support/cs_tools/.cs_tools && rm -r
), then try installing again.
@boonhapus apologies for the delay; I was traveling.
The upgrade worked & deploy seems to work, but in the output, I see these errors about tables not existing. Am I misinterpreting, shouldn't the SpotApp be creating these tables?
@paulkaefer have you run the Searchable gather command to populate your database yet? The deploy step only submits TML to ThoughtSpot, but you'll get failures if those tables don't exist in Snowflake, Redshift, etc.
@boonhapus ah, I see that now. Okay, are there any examples of a syncer? the documentation lists protocol://DEFINITION.toml
but I don't see any examples/guidance.
Also, just to clarify: once deployed, will the Searchable SpotApp then automatically export metadata? Or will I have to schedule a bi-server
and/or gather
command to run daily/weekly?
Good questions @paulkaefer -- I don't have the process well documented here. Essentially..
searchable gather
will grab a snapshot of all the metadata that exists in your system -- objects like tables, worksheets, answers, as well as users, groups, tags, and then all the intersections between all those entities (sharing, group memberships, tagging). These are point-in-time snapshots, of whenever you run the actual command.
searchable bi-server
will use the SearchData API to fetch rows from the TS: BI Server worksheet. This is an activity record of all the API calls a User can make in the platform -- things like searching on Worksheets, visiting visualizations, opening their profile, etc. It's effectively a history table.
Both of these commands will generate datasets, which you can export using the --syncer
option. Since many CS Tools commands will generate datasets, I wanted to write some way of swapping connectors to popular data formats (notably our main supported Connections, and CSV / SQLite). The interface here will get some rework during the holidays to be cleaner, but I haven't gotten around to it yet! 😄
Syncers are documented here and can be used in 2 main formats. One is a config-file based format where you put your whole configuration in a .toml
file that might look like..
csv-export.toml
(this name can be whatever you want, as long as it's a toml)
[configuration]
directory = '/Users/paulkaefer/Downloads/'
delimiter = '|'
The second format can be provided directly on the CLI, in the syncer option itself.
cs_tools tools searchable gather --syncer csv://directory=/Users/paulkaefer/Downloads/&delimiter=|
you might need to wrap the argument in double quotes so that the terminal plays nicely..
--syncer "csv://directory=/Users/paulkaefer/Downloads/&delimiter=|"
We've got customers taking different strategies here, but most will run searchable gather
daily with the truncate_on_load = true
syncer database parameter, and then searchable bi-server
on an daily/bi-weekly/weekly basis with truncate_on_load = false
so they are simply appending to their data store.
I'll probably be adding support for Parquet, Databricks, Redshift, and Postgres during the holiday time frame.
Finally....
searchable deploy
simply will push this TML up to your ThoughtSpot cluster, doing some basic replacements on the Connection
name and external database information (db, schema).
Closing this one out -- if it's still not working, feel free to open it back up!
@boonhapus finally picking this back up!
The good news is this exports CSV files:
λ cs_tools tools searchable gather --syncer "csv://directory=/Users/paulkaefer/Downloads/TS_metadata/&delimiter=|"
Unfortunately, I'm getting an error with the Snowflake syncer I setup:
λ cs_tools tools searchable gather snowflake:///Users/paulkaefer/syncers/snowflake.toml
[14:00:38] ERROR Missing parameter: syncer main.py:154
I'm also wondering if I can also provide a schema
? The documentation you pointed to doesn't mention this parameter specifically.
This should be pretty easy to fix @paulkaefer , it looks like you forgot --syncer
.
cs_tools tools searchable gather --syncer snowflake:///Users/paulkaefer/syncers/snowflake.toml
Schema should be possible as well, see the Snowflake Syncer docs specifically.
@boonhapus thanks for your help! I am able to run gather
and bi-server
commands, and am now working on scheduling them to run daily.
That's excellent news @paulkaefer ! Do take note that the gather
command effectively grabs the metadata in your system as-is, like a snapshot, whereas the bi-server
command is fetching data from the TS: BI Server worksheet, which is effectively an activity log of the API calls that users make in your cluster.
I typically recommend having 2 syncer files, one where you set truncate_on_load = true
for the gather
metadata command, and truncate_on_load = false
on the bi-server
one so you're not dropping history all the time.
Additionally, it's a popular pattern to fetch a lot of history on BI Server once (this can be very slow for highly active clusters) and then do incremental data loads with the --from-date
and --to-date
parameters.
In a future release, I plan to add a UPSERT
capability as well so users don't have to deal with this complexity themselves.
Feel free to use Discussions or open another Issue if you have more questions.
First Stop
Platform Configuration
Description
When attempting to run
cs_tools tools searchable deploy --connection-guid <GUID redacted> --database GOVERNANCE --schema THOUGHTSPOT_METADATA
, I got the following error. I tried twice, per the end of the output, and did also runcs_tools logs report
.