hoverinc / tableau-utilities

A module and CLI Utility for managing Tableau objects, locally, and in Tableau Online.
MIT License
4 stars 1 forks source link

Refactor CLI - Add Generate Config and Merge Config Functionality #17

Closed jaybythebay closed 1 year ago

jaybythebay commented 1 year ago

Generate an Airflow config from a database

jaybythebay commented 1 year ago

Thanks @JustinGrilli Those are all good suggestions and I'll work them in,. I'm running into an issue in case you have a suggestion for how to approach:

  1. This code only seems to get columns that have a cleaned name:
    columns = [c.dict() for c in Datasource(datasource_path).columns]
    rows.setdefault(datasource.name, [])
    rows[datasource.name].extend(columns)
  2. In our datasources we have LOTS of names that aren't nice and cleaned yet. Those only have ALL_CAPS_NAMES and appear in these metadata_Record objects like this:
    </metadata-record>
          <metadata-record class='column'>
            <remote-name>SALESFORCE_OPPORTUNITY_STAGE</remote-name>
            <remote-type>129</remote-type>
            <local-name>[SALESFORCE_OPPORTUNITY_STAGE]</local-name>
            <parent-name>[DIM_OPPORTUNITY]</parent-name>
            <remote-alias>SALESFORCE_OPPORTUNITY_STAGE</remote-alias>
            <ordinal>17</ordinal>
            <local-type>string</local-type>
            <aggregation>Count</aggregation>
            <width>765</width>
            <contains-null>true</contains-null>
            <collation flag='0' name='binary' />
            <attributes>
              <attribute datatype='string' name='DebugRemoteType'>&quot;SQL_VARCHAR&quot;</attribute>
              <attribute datatype='string' name='DebugWireType'>&quot;SQL_C_CHAR&quot;</attribute>
              <attribute datatype='string' name='TypeIsVarchar'>&quot;true&quot;</attribute>
            </attributes>
            <_.fcp.ObjectModelEncapsulateLegacy.true...object-id>[DIM_OPPORTUNITY (SALESFORCE.DIM_OPPORTUNITY)_77429AF611914BD18F4E214B8BDEE9A6]</_.fcp.ObjectModelEncapsulateLegacy.true...object-id>
          </metadata-record>

It seems that I have a few options for dealing with this:

  1. rewrite the columns code to get columns from metadata records. This seems like a pain.
  2. Run a function to convert all the columns to title case first. I think I could use scripts/init_cap_fields.py to do this
  3. Do some other suggestions that you have?

I'll go ahead with #2 in the morning unless you have suggestions on this

JustinGrilli commented 1 year ago

@jaybythebay Yeah if no columns have been added to the column section of the XML, they wont appear in datasource.columns. This is by design - the Datasource class should match the XML.

It is up to the end user to add/update columns to the datasource.columns section; hence the point of this class and the functionality here 😄.

So yes my suggestion would be this (Basiclly #2 you have listed there):

  1. Get the list of all columns from the the datasource.connection.metadata_records
  2. Get the list of columns from datasource.columns
  3. If the metadata column exists in datasource.columns use that info for the column_config
  4. If the metadata column does not exist in datasource.columns, use the metadata info for the column_config, and do some cleanup on the remote_name for the column's caption (alias / top level key in the config)
jaybythebay commented 1 year ago

@JustinGrilli This is ready for review. I'd love to merge this in to your branch and then 1 of us can finish the code to migrate the rest of your functions from the main script to the cli.

jaybythebay commented 1 year ago

I couldn't get these to squash since I had a merge in the middle I've created a new branch with this work.