looker-open-source / looker-explore-assistant

A React Application for interacting with Looker data through natural language.
MIT License
97 stars 65 forks source link

Further Issues when running the setup instructions, for the "BigQuery Deployment" option #9

Open markrittman opened 6 months ago

markrittman commented 6 months ago

I followed the BigQuery Deployment instructions and managed to get through the LLM deployment steps, working around the following issues:

1) The instruction to use the notebook to create the stringified examples doesn't include any instructions on how to run the notebook - how do you run it?

2) The step that tells you to:

INSERT INTO explore_assistant_demo_logs.explore_assistant_examples (explore_id,examples)
    VALUES ('model:explore',examples);

references a dataset name that we've not previously created (explore_assistant_demo_logs) and so I created that dataset and then inserted into the explore_assistant_examples table the following values, and repeated this for a table of the same name in the explore_assistant dataset that we had been working with up until this point.

insert into explore_assistant_demo_logs.explore_assistant_examples
  values('model:explore',"""
    input: companies with revenue > 100
    output :fields=companies_dim.company_name,projects_invoiced.total_invoiced_net_amount_gbp&f[projects_invoiced.total_invoiced_net_amount_gbp]>=100000&sorts=projects_invoiced.total_invoiced_net_amount_gbp desc &limit=500
 """)

Then I followed the instructions under "2. Looker Extension Framework Setup".

1) For step 2, you suggest running these steps from Cloud Shell, but every time I try to do this I run out of space, it seems that the files that npm install downloads and tries to unpack eventually exceed the 5GB limit we have when using Cloud Shell.

2) Doing this from my Mac, I think, works? But when you say "You may need to update your Node version or use a Node version manager to change your Node version." it's not clear what version of Node we should install? I went with the default for the Mac but I don't really know if the rest of the steps worked ok or not, this bit of the instruction isn't very clear - this was the output from my attempt to run this install:

markrittman@Marks-iMac extension-bigquery-deployment % npm install
npm WARN EBADENGINE Unsupported engine {
npm WARN EBADENGINE   package: 'explore-assistant@0.1.0',
npm WARN EBADENGINE   required: { node: '>=14 <17' },
npm WARN EBADENGINE   current: { node: 'v20.11.1', npm: '10.2.4' }
npm WARN EBADENGINE }
npm WARN deprecated @babel/plugin-proposal-class-properties@7.18.6: This proposal has been merged to the ECMAScript standard and thus this plugin is no longer maintained. Please use @babel/plugin-transform-class-properties instead.
npm WARN deprecated @babel/plugin-proposal-object-rest-spread@7.20.7: This proposal has been merged to the ECMAScript standard and thus this plugin is no longer maintained. Please use @babel/plugin-transform-object-rest-spread instead.
npm WARN deprecated trim@0.0.1: Use String.prototype.trim() instead
npm WARN deprecated har-validator@5.1.5: this library is no longer supported
npm WARN deprecated @babel/plugin-proposal-object-rest-spread@7.12.1: This proposal has been merged to the ECMAScript standard and thus this plugin is no longer maintained. Please use @babel/plugin-transform-object-rest-spread instead.
npm WARN deprecated request-promise-native@1.0.9: request-promise-native has been deprecated because it extends the now deprecated request package, see https://github.com/request/request/issues/3142
npm WARN deprecated uuid@3.4.0: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated eslint-plugin-standard@5.0.0: standard 16.0.0 and eslint-config-standard 16.0.0 no longer require the eslint-plugin-standard package. You can remove it from your dependencies with 'npm rm eslint-plugin-standard'. More info here: https://github.com/standard/standard/issues/1316
npm WARN deprecated request@2.88.2: request has been deprecated, see https://github.com/request/request/issues/3142

added 1183 packages, and audited 1184 packages in 1m

3) For Step 4 where you say "Start the development server IMPORTANT If you are running the extension from a VM or another remote machine, you will need to Port Forward to the machine where you are accessing the Looker Instance from. Here's a boilerplate example for port forwarding the remote port 8080 to the local port 8080: ssh username@host -L 8080:localhost:8080." - I understand the concept of port forwarding but where exactly do you run this command? On my Mac? On a VM if I'd have run these steps there? Somehow on the Looker server? I just ignored these steps in the end.

4) Then for Step 5 you say "IMPORTANT please paste in the deployed Cloud Function URL into the external_api_urls list. This will allowlist it in Looker for fetch requests." - what cloud function? I see you mention creating one in another set of steps that are an alternative deployment approach to the BigQuery deployment, but are you now saying we have to deploy that cloud function anyway?

5) I then went to the Deployment section and managed to put together the manifest file, I think correctly but I'm guessing at this stage:

project_name: "analytics"
application: explore-assistant {
  label: "Explore Assistant"
  # file: "explore-assistant.js"
  file: "bundle.js"
  entitlements: {
    external_api_urls: ["https://localhost:8080","http://localhost:8080"]
    core_api_methods: ["lookml_model_explore","run_inline_query","create_sql_query","run_sql_query"]
    navigation: yes
    use_embeds: yes
    use_iframes: yes
    new_window: yes
    new_window_external_urls: ["https://developers.generativeai.google/*"]
    local_storage: yes
  }
}

Running the Explore Assistant extension brings-up this screen, which looked promising but I noted, didn't reference any of the example queries I inserted into the explore_assistant_examples table earlier:

image

I then try entering a natural language query e.g. "total invoiced revenue by company" and whilst the explore assistant extension app just hangs, if I then check the Job History tab in the BigQuery Studio web app I can see that the app has sent the query to BigQuery DBML, as shown in the truncated SQL text below:


          DECLARE context STRING;
          SET context = """Youre a developer who would transalate questions to a structured URL query based on the following dictionary - choose only the fileds in the below description
          user_order_facts is an extension of user and should be used when referring to users or customers.Generate only one answer, no more.""";

          SELECT ml_generate_text_llm_result AS generated_content
          FROM ML.GENERATE_TEXT(
              MODEL analytics_ai.explore_assistant_llm,
              (
                  SELECT FORMAT('Context: %s; LookML Metadata: %s; Examples: %s; input: %s, output: ',context,"Dimensions Used to group by information (follow the instructions in tags when using a specific field; if map used include a location or lat long dimension;): name: companies_dim.company_name, type: string, description: , tags: ;name: rfm_model.company_pk, type: string, description: , tags: ;name: companies_dim.company_description, type: string, description: Company Bio, sourced from LinkedIn via Hubspot, tags: ;name: companies_dim.company_industry, type: string, description: , tags: ;name: 
...
contracts_fact.avg_pct_signatures_remaining, type: average, description: , tags: ;name: client_concentration.count, type: count, description: , tags: ",examples.examples, "total invoiced revenue by company") as prompt
                  FROM explore_assistant.explore_assistant_examples as examples
                  WHERE examples.explore_id = "analytics:companies_dim"
              ),
                  STRUCT(
                      0.1 AS temperature,
                      1024 AS max_output_tokens,
                      0.95 AS top_p,
                      40 AS top_k,
                      TRUE AS flatten_json_output
              )
          )

However the Explore Assistant app never returns any results.

In summary:

1) It's never really clear to me whether the npm install part worked and whether I still needed to switch Node version (to what version? 17?)

  1. Do we need to deploy the cloud function or not? If so then the steps to do so need to be added to the BigQuery Deployment set of steps

  2. The bit about port forwarding and running this on your local machine (or a VM?) really aren't that clear and do we really need them, if the goal is to get this all running independently on your Looker instance?

  3. Whilst despite all of the above, the extension app seems to be working at the end and sending requests to my BigQuery instance, the app never renders any results, or has my example queries listed for selection and running (and has a load of boilerplate text that wouldn't apply to a customer deployment i.e. the ecommerce data)

Hope this helps and looking-forward to getting a version running at some point! Cheers

LukaFontanilla commented 6 months ago

Mark,

A new version of the codebase was deployed to address a few doc notes from this FR, namely:

Outstanding and what we are looking into:

To note on the last bullet point here ^, The examples in the current documentation are used for training the LLM. They are separate from the suggested prompts that show up in the UI of the Explore Assistant. For a while those have been hardcoded in the Frontend UI to serve as an example based on the ecomm use case. To change them to date, you would need to update the Frontend code and rebuild the frontend javascript (see the code sample below). For the BQ deployment however, we are working on an automated approach for that which is represented in the last bullet point.

Here is the line number

const categorizedPrompts = [
    {
      category: 'Cohorting',
      prompt: 'Count of Users by first purchase date',
      color: 'blue',
    },
    {
      category: 'Audience Building',
      prompt:
        'Users who have purchased more than 100 dollars worth of Calvin Klein products and have purchased in the last 30 days',
      color: 'green',
    },
    {
      category: 'Period Comparison',
      prompt:
        'Total revenue by category this year compared to last year in a line chart with year pivoted',
      color: 'red',
    },
  ]
markrittman commented 6 months ago

Hi Luka,

Step 5 in the BigQuery Deployment instructions still reference the cloud function - "The BigQuery steps still have the instruction "IMPORTANT please paste in the deployed Cloud Function URL into the external_api_urls" - Is this step still needed and if so, how do you setup the cloud function?

Also is it mandatory to complete step 4 in the "Getting Started for Development" section and run the node app from your local development machine, or can this step be skipped (and the setting-up of port forwarding) if you just want to follow the steps under "Deployment" where we run npm run build and then upload the files to the Looker instance to run there?

LukaFontanilla commented 6 months ago

It is not and a miss on my end for that section in the docs. I'll clear and push a new update. For step 4, that is only mandatory if you want to make Frontend UI changes and test those before building the source code with npm run build. It can be skipped otherwise.