irthomasthomas / undecidability

6 stars 2 forks source link

The Querynomicon: An Introduction to SQL for Weary Data Scientists #639

Open irthomasthomas opened 6 months ago

irthomasthomas commented 6 months ago

The Querynomicon

DESCRIPTION: "The Querynomicon

An Introduction to SQL for Weary Data Scientists

Upon first encountering SQL after two decades of Fortran, C, Java, and Python, I thought I had stumbled into hell. I quickly realized that was optimistic: after all, hell has rules.

I have since realized that SQL does too, and that they are no more confusing or contradictory than those of most other programming languages. They only appear so because it draws on a tradition unfamiliar to those of us raised with derivatives of C. To quote Terry Pratchett, it is not mad, just differently sane.

Welcome, then, to a world in which the strange will become familiar, and the familiar, strange. Welcome, thrice welcome, to SQL.

"The Querynomicon" is a Third Bit production.

What This Is

Scope

URL: The Querynomicon

Suggested labels

{'label-name': 'SQL-introduction', 'label-description': 'Introduction to SQL for beginners and data scientists', 'gh-repo': 'https://github.com/gvwilson/sql-tutorial', 'confidence': 78.55}

irthomasthomas commented 6 months ago

Related issues

545: Pragma statements supported by SQLite

DetailsSimilarity score: 0.84 - [ ] [https://sqlite.org/pragma.html](https://sqlite.org/pragma.html#toc) # Pragma statements supported by SQLite ## Description The PRAGMA statement is an SQL extension specific to SQLite and used to modify the operation of the SQLite library or to query the SQLite library for internal (non-table) data. The PRAGMA statement is issued using the same interface as other SQLite commands (e.g. SELECT, INSERT) but is different in the following important respects: - The pragma command is specific to SQLite and is not compatible with any other SQL database engine. - Specific pragma statements may be removed and others added in future releases of SQLite. There is no guarantee of backwards compatibility. - No error messages are generated if an unknown pragma is issued. Unknown pragmas are simply ignored. This means if there is a typo in a pragma statement the library does not inform the user of the fact. - Some pragmas take effect during the SQL compilation stage, not the execution stage. This means if using the C-language sqlite3_prepare(), sqlite3_step(), sqlite3_finalize() API (or similar in a wrapper interface), the pragma may run during the sqlite3_prepare() call, not during the sqlite3_step() call as normal SQL statements do. Or the pragma might run during sqlite3_step() just like normal SQL statements. Whether or not the pragma runs during sqlite3_prepare() or sqlite3_step() depends on the pragma and on the specific release of SQLite. - The EXPLAIN and EXPLAIN QUERY PLAN prefixes to SQL statements only affect the behavior of the statement during sqlite3_step(). That means that PRAGMA statements that take effect during sqlite3_prepare() will behave the same way regardless of whether or not they are prefaced by "EXPLAIN". - The C-language API for SQLite provides the SQLITE_FCNTL_PRAGMA file control which gives VFS implementations the opportunity to add new PRAGMA statements or to override the meaning of built-in PRAGMA statements. ## PRAGMA Command Syntax ```sql PRAGMA schema-name.pragma-name(pragma-value) = pragma-value ``` - A pragma can take either zero or one argument. The argument is may be either in parentheses or it may be separated from the pragma name by an equal sign. The two syntaxes yield identical results. In many pragmas, the argument is a boolean. The boolean can be one of: - 1, yes, true, on - 0, no, false, off - Keyword arguments can optionally appear in quotes. (Example: 'yes' [FALSE].) Some pragmas takes a string literal as their argument. When pragma takes a keyword argument, it will usually also take a numeric equivalent as well. For example, "0" and "no" mean the same thing, as does "1" and "yes". When querying the value of a setting, many pragmas return the number rather than the keyword. - A pragma may have an optional schema-name before the pragma name. The schema-name is the name of an ATTACH-ed database or "main" or "temp" for the main and the TEMP databases. If the optional schema name is omitted, "main" is assumed. In some pragmas, the schema name is meaningless and is simply ignored. In the documentation below, pragmas for which the schema name is meaningful are shown with a "schema." prefix. ## PRAGMA Functions PRAGMAs that return results and that have no side-effects can be accessed from ordinary SELECT statements as table-valued functions. For each participating PRAGMA, the corresponding table-valued function has the same name as the PRAGMA with a 7-character "pragma_" prefix. The PRAGMA argument and schema, if any, are passed as arguments to the table-valued function, with the schema as an optional, last argument. For example, information about the columns in an index can be read using the index_info pragma as follows: ```sql PRAGMA index_info('idx52'); ``` Or, the same content can be read using: ```sql SELECT * FROM pragma_index_info('idx52'); ``` The advantage of the table-valued function format is that the query can return just a subset of the PRAGMA columns, can include a WHERE clause, can use aggregate functions, and the table-valued function can be just one of several data sources in a join. For example, to get a list of all indexed columns in a schema, one could query: ```sql SELECT DISTINCT m.name || '.' || ii.name AS 'indexed-columns' FROM sqlite_schema AS m, pragma_index_list(m.name) AS il, pragma_index_info(il.name) AS ii WHERE m.type='table' ORDER BY 1; ``` Additional notes: - Table-valued functions exist only for built-in PRAGMAs, not for PRAGMAs defined using the SQLITE_FCNTL_PRAGMA file control. - Table-valued functions exist only for PRAGMAs that return results and that have no side-effects. - This feature could be used to implement information schema by first creating a separate schema using ```sql ATTACH ':memory:' AS 'information_schema'; ``` Then creating VIEWs in that schema that implement the official information schema tables using table-valued PRAGMA functions. The table-valued functions for PRAGMA feature was added in SQLite version 3.16.0 (2017-01-02). Prior versions of SQLite cannot use this feature. [More information](https://sqlite.org/pragma.html#toc) #### Suggested labels #### {'label-name': 'SQL commands', 'label-description': 'Descriptions and syntax of SQL commands specific to SQLite.', 'confidence': 66.84}

546: [2304.11015] DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction

### DetailsSimilarity score: 0.82 - [ ] [[2304.11015] DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction](https://arxiv.org/abs/2304.11015) # [2304.11015] DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction **DESCRIPTION:** DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction Mohammadreza Pourreza, Davood Rafiei There is currently a significant gap between the performance of fine-tuned models and prompting approaches using Large Language Models (LLMs) on the challenging task of text-to-SQL, as evaluated on datasets such as Spider. To improve the performance of LLMs in the reasoning process, we study how decomposing the task into smaller sub-tasks can be effective. In particular, we show that breaking down the generation problem into sub-problems and feeding the solutions of those sub-problems into LLMs can be an effective approach for significantly improving their performance. Our experiments with three LLMs show that this approach consistently improves their simple few-shot performance by roughly 10%, pushing the accuracy of LLMs towards SOTA or surpassing it. On the holdout test set of Spider, the SOTA, in terms of execution accuracy, was 79.9 and the new SOTA at the time of this writing using our approach is 85.3. Our approach with in-context learning beats many heavily fine-tuned models by at least 5%. Additionally, when evaluated on the BIRD benchmark, our approach achieved an execution accuracy of 55.9%, setting a new SOTA on its holdout test set. URL: [https://arxiv.org/abs/2304.11015](https://arxiv.org/abs/2304.11015) #### Suggested labels #### {'label-name': 'Text-to-SQL', 'label-description': 'Focuses on generating SQL queries from natural language text.', 'confidence': 76.74}

544: JSON Schema

### DetailsSimilarity score: 0.82 - [ ] [JSON Schema](https://json-schema.org/understanding-json-schema#where-to-begin) # JSON Schema Understanding JSON Schema¶ JSON Schema is a powerful tool for validating the structure of JSON data. However, learning to use it by reading its specification is like learning to drive a car by looking at its blueprints. You don't need to know how an electric motor fits together if all you want to do is pick up the groceries. This book, therefore, aims to be the friendly driving instructor for JSON Schema. It's for those that want to write it and understand it, but maybe aren't interested in building their own car—er, writing their own JSON Schema validator — just yet. This tutorial describes JSON Schema draft 2020-12. Earlier versions of JSON Schema are not completely compatible with the format described here, but for the most part, those differences are noted in the text. ## Where to begin?¶ This book uses some novel conventions for showing schema examples and relating JSON Schema to your programming language of choice. If you're not sure what a schema is, check out [What is a schema?](https://json-schema.org/understanding-json-schema#what-is-a-schema) The basics chapter should be enough to get you started with understanding the core JSON Schema Reference. When you start developing large schemas with many nested and repeated sections, check out [Structuring a complex schema](https://json-schema.org/understanding-json-schema/structuring.html). [json-schema.org](https://json-schema.org/) has a number of resources, including the official specification and tools for working with JSON Schema from various programming languages. There are a number of online JSON Schema tools that allow you to run your own JSON schemas against example documents. These can be very handy if you want to try things out without installing any software. URL: [https://json-schema.org/understanding-json-schema#where-to-begin](https://json-schema.org/understanding-json-schema#where-to-begin) #### Suggested labels #### {'label-name': 'Data-Validation', 'label-description': 'Validating JSON data structure and format', 'gh-repo': 'https://json-schema.org/understanding-json-schema#where-to-begin', 'confidence': 57.69}

50: "Yes, that exact niche thing you want does exist!" And other hallucinations

### DetailsSimilarity score: 0.81 llm "Is there an equivalent tool to fzf for sql? Sqlite3 in particular? Sqlite-extensions? fuzzy search and instant feedback" Yes, there is an equivalent tool to fzf for SQL queries called "fzf+sqlite-extensions". It combines fuzzy search functionality provided by the fzf tool with SQLite's full-text search extensions. This allows you to perform fuzzy searches on SQL queries and get instant feedback. Here is an example of how to use it: 1. Install SQLite extensions: First, you need to install SQLite's full-text search extensions. You can find the installation instructions in the SQLite documentation or use a tool like `sqlite-utils` to install them. 2. Install fzf+sqlite-extensions: Next, you can install the `fzf+sqlite-extensions` tool. It is available as a plugin for popular text editors like Vim and Emacs. You can also find standalone versions of the tool on GitHub. 3. Run fuzzy searches: Once installed, you can start using fuzzy searches with SQLite. For example, if you have a SQLite database file called "data.db" and want to search for a specific value in a table, you can use the following command: ```bash cat data.db | fzf+sqlite-extensions -d ``` This command will open the fzf interface where you can start typing your fuzzy search query. As you type, the tool will provide instant feedback and show the matching rows from the SQLite database. Note: Make sure to adjust the command based on your specific setup and requirements. The above example assumes that you have a SQLite database file named "data.db" and want to search the entire database.

461: Running Steampipe extensions in sqlite-utils and Datasette | Simon Willison’s TILs

DetailsSimilarity score: 0.81 - [ ] [Running Steampipe extensions in sqlite-utils and Datasette | Simon Willison’s TILs](https://til.simonwillison.net/sqlite/steampipe#user-content-running-extensions-in-datasette) # Running Steampipe extensions in `sqlite-utils` and Datasette Steampipe is a build software that enables you to query different APIs directly from SQL databases. The original product was designed for PostgreSQL, but last week, they announced support for SQLite, along with ports of 100 of their existing extensions. Here's the full list of SQLite extensions they released. To install and use these extensions, you can either use their CLI tool or access the .so files directly via GitHub releases. This guide will demonstrate how to run these extensions on a Mac using both `sqlite-utils` and Datasette. # Downloading an extension Let's start with the Hacker News API plugin, `turbot/steampipe-plugin-hackernews`. This plugin doesn't require an API key, making it a great choice for this example. First, grab the latest release of the extension. For an M2 MacBook, download the `steampipe_sqlite_hackernews.darwin_arm64.tar.gz` file: ```bash curl -OL https://github.com/turbot/steampipe-plugin-hackernews/releases/download/v0.8.1/steampipe_sqlite_hackernews.darwin_arm64.tar.gz tar -xzvf steampipe_sqlite_hackernews.darwin_arm64.tar.gz ``` Now, you have a `steampipe_sqlite_hackernews.so` file. # Loading extensions with `sqlite-utils` With `sqlite-utils` installed, you can run the following command: ```bash sqlite-utils memory --load-extension steampipe_sqlite_hackernews.so \ 'select id, title, time from hackernews_top limit 3' ``` This will produce a warning box due to the unsigned binary. To bypass this error, open the system Security preferences pane, find the option to allow unsigned binaries, and click "Allow Anyway". Then, try running the command again. After allowing the extension, you'll see another dialog. Click "Open" to run the script correctly. Now, you can run the command again: ```bash sqlite-utils memory --load-extension steampipe_sqlite_hackernews.so \ 'select id, title, time from hackernews_top limit 3' ``` You should receive the response: ```json [ { "id": 38706914, "title": "Gameboy Music and Sound Archive for MIDI", "time": "2023-12-20 09:45:05" }, { "id": 38717114, "title": "Show HN: Talk to any ArXiv paper just by changing the URL", "time": "2023-12-21 04:48:20" }, { "id": 38716075, "title": "OpenAI Begins Tackling ChatGPT Data Leak Vulnerability", "time": "2023-12-21 01:38:10" } ] ``` You've successfully queried the Hacker News API using SQL! # Running extensions in Datasette After enabling an extension, it can be used directly with Datasette as well. Let's try using the `turbot/steampipe-plugin-crtsh` plugin for querying certificate transparency logs. Download the latest `steampipe_sqlite_crtsh.darwin_arm64.tar.gz` file for macOS: ```bash curl -OL https://github.com/turbot/steampipe-plugin-crtsh/releases/download/v0.4.0/steampipe_sqlite_crtsh.darwin_arm64.tar.gz tar -xzvf steampipe_sqlite_crtsh.darwin_arm64.tar.gz ``` Now, load both extensions using Datasette: ```sql datasette \ --load-extension steampipe_sqlite_crtsh.so \ --load-extension steampipe_sqlite_hackernews.so \ --setting sql_time_limit_ms 20000 ``` The `--setting sql_time_limit_ms 20000` line increases the default time limit on SQL queries from 1s to 20s, which is useful for some of these API calls since they can be a little slow. You can now query the certificate transparency log with SQL: ```sql select dns_names, not_after from crtsh_certificate where query = 'datasette.io' order by not_after desc; ``` Here's a more complex query demonstrating a CTE, JSON processing, and a join across two virtual tables: ```sql with post_ids as ( select value from json_each(submitted), hackernews_user where hackernews_user.id = 'simonw' limit 20 ) select * from hackernews_item where hackernews_item.id in ( select value from post_ids ) order by time desc ``` For more information on the kind of queries you can run, refer to the following documentation: Tables in crt.sh Tables in Hacker News #### Suggested labels #### { "label-name": "Steampipe", "description": "Querying APIs from SQL databases with Steampipe extensions", "confidence": 97.71 }