This pull request addresses an issue in the extract_sql function where the regex used to extract SQL queries from LLM responses incorrectly matches the keyword "WITHIN". This results in incomplete and invalid SQL queries being extracted. The regex has been updated to use word boundaries to ensure accurate matching of the "WITH" keyword at the start of CTEs. This fix resolves the syntax errors encountered when running the extracted SQL queries.
Issue:
When a response contains a SQL query with the keyword "WITHIN", the regex mistakenly captures partial SQL statements, resulting in invalid SQL. An example of such an SQL query is provided below:
Example SQL:
SELECT
CASE
WHEN EXTRACT(YEAR FROM AGE(CURRENT_DATE, dob)) < 20 THEN '< 20'
WHEN EXTRACT(YEAR FROM AGE(CURRENT_DATE, dob)) BETWEEN 20 AND 29 THEN '20-29'
WHEN EXTRACT(YEAR FROM AGE(CURRENT_DATE, dob)) BETWEEN 30 AND 39 THEN '30-39'
WHEN EXTRACT(YEAR FROM AGE(CURRENT_DATE, dob)) BETWEEN 40 AND 49 THEN '40-49'
ELSE '50+'
END AS age_group,
COUNT(DISTINCT customer.customer_id) AS customer_count,
SUM(invoice.invoice_amount) AS total_spent,
COUNT(invoice.invoice_id) AS total transactions,
AVG(invoice.invoice_amount) AS average spent per transaction,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY invoice.invoice_amount) AS median_spent
FROM
customer
LEFT JOIN invoice ON customer.customer_id = invoice.customer_id
GROUP BY
age_group;
Extracted SQL before fix:
WITHIN GROUP (ORDER BY invoice.invoice_amount) AS median_spent
FROM
customer
LEFT JOIN invoice ON customer.customer_id = invoice.customer_id
GROUP BY
age_group;
Solution:
The regex was updated to use word boundaries to correctly match the keyword "WITH" at the start of a CTE, ensuring accurate extraction of SQL statements.
Pull Request Description:
This pull request addresses an issue in the
extract_sql
function where the regex used to extract SQL queries from LLM responses incorrectly matches the keyword "WITHIN". This results in incomplete and invalid SQL queries being extracted. The regex has been updated to use word boundaries to ensure accurate matching of the "WITH" keyword at the start of CTEs. This fix resolves the syntax errors encountered when running the extracted SQL queries.Related Previous Pull Requests:
Issue: When a response contains a SQL query with the keyword "WITHIN", the regex mistakenly captures partial SQL statements, resulting in invalid SQL. An example of such an SQL query is provided below:
Example SQL:
Extracted SQL before fix:
Solution: The regex was updated to use word boundaries to correctly match the keyword "WITH" at the start of a CTE, ensuring accurate extraction of SQL statements.
Updated regex:
The current fix ensures that SQL extraction handles CTEs properly without misinterpreting "WITHIN" as a CTE starter.