adampingel commented 4 months ago

Objectives

As a potential user of Granite Code, I will be able to experiment hands-on with the Granite Code's text to sql capability, and to get some initial concrete details of how a production deployment that leverages this capability could be built.

This was touted in a June 1 blog entry: https://research.ibm.com/blog/granite-LLM-text-to-SQL, which cites the BIRD leaderboard https://bird-bench.github.io/ ("BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing. BIRD contains over 12,751 unique question-SQL pairs, 95 big databases with a total size of 33.4 GB")

The workflow should

Ask for Schema
Ask for Example Data
Ask for the natural-language query to be translated into SQL
Be made available to the public as a python notebook in https://github.com/granite-cookbooks/granite-code-cookbook

The textual context should

Show how to use a hosted Granite Code model
Walk the (notebook) user through each step of the workflow, explaining the inputs, outputs, and error conditions to check for
Cite relevant benchmarks
Explain how a "real world" application of this workflow could be built

Existing Prompting Service

From a slide on IBM Flowpilot SQL Prompting Service for Granite:

The SQL prompting library allows to create custom prompts in a fast and intuitive way such that it is easy to experiment with new prompts

It is consumed in 3 ways:

Through a OpenAPI compatible REST API (being integrated in Watson Ochestrate)
Through a pyhton notebook that allows to experiment with prompts
Directly, as a python library

The library can make use of various granite-based models for schema linking, content linking and SQL generation.

The library can connect to databases to obtain schemas and sample values to be included in the prompt.

UI Considerations (Not in Scope)

If we build a demonstration LangChain.JS + LangServe app to demonstrate this, we should consider:

After providing the translated query in SQL, the user can either translate another query for the same schema/data, or start at the beginning
Save and display the application state, which in this case is 1) schema, 2) data, and 3) natural language query
Provide "canned" schema to save the user time digging one up (or supplying their own).
Provide "canned" queries

Test Cases

In addition to the BIRD benchmark cited above, some other schema for testing might include:

Airlines

https://www.kaggle.com/datasets/open-flights/airline-database. A nice feature here is the heterogeneity of the columns and the fact there are several tables that will be useful for testing joins.
Taken from this more expansive list of datasets: https://openflights.org/data.php
Other, similar sources of airline data, including at data.gov.

Stocks

Org

https://github.com/ronaldbradford/schema/blob/master/employees.sql (also see peer schema)

Acceptance Criteria

Friction logs from at least two testers (who are not the author)
Automated testing should verify that the steps in the recipe (notebook cells) work

Assumptions, Open Questions, and Potential Complications

The templates in the referenced python library are assumed to be transferrable to the above-described workflow
They are also assumed to be free of IP restrictions, as the understanding is that this is published work

Other References

QueryCraft An IBM Client Engineering repo with a sophisticated example oriented towards watsonx.ai.
A Survey on Employing Large Language Models for Text-to-SQL Tasks (arxiv paper)

fayvor commented 3 months ago

Digging into this.