Closed datapythonista closed 2 years ago
I tried to test if the error also happens in PostgreSQL, and looks like it's raising an exception:
ProgrammingError: (psycopg2.errors.SyntaxError) subquery in FROM must have an alias
LINE 2: FROM countries AS t0, (SELECT t0.iso_alpha2 AS iso_alpha2, t...
^
HINT: For example, FROM (SELECT ...) [AS] foo.
[SQL: SELECT t0.iso_alpha2, t0.iso_alpha3, t0.iso_numeric, t0.fips, t0.name, t0.capital, t0.area_km2, t0.population, t0.continent
FROM countries AS t0, (SELECT t0.iso_alpha2 AS iso_alpha2, t0.iso_alpha3 AS iso_alpha3, t0.iso_numeric AS iso_numeric, t0.fips AS fips, t0.name AS name, t0.capital AS capital, t0.area_km2 AS area_km2, t0.population AS population, t0.continent AS continent
FROM countries AS t0
WHERE EXISTS (SELECT 1
FROM (SELECT t2.continent AS continent, t2.count AS count
FROM (SELECT t0.continent AS continent, count(t0.continent) AS count
FROM countries AS t0 GROUP BY t0.continent) AS t2 ORDER BY t2.count DESC
LIMIT %(param_1)s) AS t1
WHERE t0.continent = t1.continent))
LIMIT %(param_2)s]
[parameters: {'param_1': 3, 'param_2': 10000}]
(Background on this error at: http://sqlalche.me/e/13/f405)
This is the generated SQL which happens to be invalid:
SELECT t0.iso_alpha2, t0.iso_alpha3, t0.iso_numeric, t0.fips, t0.name, t0.capital, t0.area_km2, t0.population, t0.continent
FROM countries AS t0, (SELECT t0.iso_alpha2 AS iso_alpha2, t0.iso_alpha3 AS iso_alpha3, t0.iso_numeric AS iso_numeric, t0.fips AS fips, t0.name AS name, t0.capital AS capital, t0.area_km2 AS area_km2, t0.population AS population, t0.continent AS continent
FROM countries AS t0
WHERE EXISTS (SELECT 1
FROM (SELECT t2.continent AS continent, t2.count AS count
FROM (SELECT t0.continent AS continent, count(t0.continent) AS count
FROM countries AS t0 GROUP BY t0.continent) AS t2 ORDER BY t2.count DESC
LIMIT %(param_1)s) AS t1
WHERE t0.continent = t1.continent))
LIMIT %(param_2)s
@jreback I'm thinking on moving the topk
tutorial to the Impala backend too, since it looks like at least in SQLite and Postgres is not well supported. Does it sound good?
Based on the examples of our tutorial:
Note the 252 rows.
Then, filtering by the 3 continents with more countries on them (Africa, Europe and Asia), we should get around half the rows:
Looks like it's not filtering, but also duplicating rows massively.
This is the query being generated: