scolby33 / OCSPdash

A dashboard for the status of the top certificate authorities' OCSP responders.
MIT License
1 stars 0 forks source link

SQL Query Fails on Postgres #19

Closed scolby33 closed 6 years ago

scolby33 commented 6 years ago

The query from here works on SQLite but not Postgres.

https://github.com/scolby33/OCSPdash/blob/be3416456ecaf725f8326f6592fc5e046172f373/src/ocspdash/manager.py#L367-L386

Here's the query:

SELECT
  result_1.id AS result_1_id,
  result_1.chain_id AS result_1_chain_id,
  result_1.location_id AS result_1_location_id,
  result_1.retrieved AS result_1_retrieved,
  result_1.ping AS result_1_ping,
  result_1.ocsp AS result_1_ocsp


FROM result AS result_1

JOIN chain ON chain.id = result_1.chain_id
JOIN location ON location.id = result_1.location_id
JOIN responder ON responder.id = chain.responder_id
JOIN authority ON authority.id = responder.authority_id


WHERE result_1.retrieved = (SELECT max(result_2.retrieved) AS max_1 FROM result AS result_2 
WHERE result_1.id = result_2.id)

GROUP BY
  location.id,
  responder.id

ORDER BY
  authority.cardinality DESC,
  authority.name,
  responder.cardinality DESC,
  responder.url,
  location.name

And here's the error SQLalchemy reports from the RDBMS:

E       psycopg2.ProgrammingError: column "result_1.id" must appear in the GROUP BY clause or be used in an aggregate function
E       LINE 1: SELECT result_1.id AS result_1_id, result_1.chain_id AS resu...
E                      ^
scolby33 commented 6 years ago

Btw, the expected result of the query is the newest Result for each Responder and Location.

For instance

Result Id Location Responder
1 NY 1
2 NY 1
3 NY 2
4 NY 2
5 CA 1
6 CA 1
7 CA 2
8 CA 2

Assuming the result ids increase with time (result 8 is newer than result 7, etc.), then the query should return results {2, 4, 6, 8}--the newest for each Location-Responder pair.

scolby33 commented 6 years ago

Okay, yeah, this is broken and it shouldn't work.

result_1_id result_1_chain_id result_1_location_id result_1_retrieved result_1_ping result_1_ocsp
1 1 1 2018-07-01 09:49:20.000000 1 1
2 2 1 2018-07-01 09:49:20.000000 1 1
6 3 1 2018-07-01 09:49:20.000000 1 1
3 4 1 2018-07-01 09:49:20.000000 1 1
4 5 1 2018-07-01 09:49:20.000000 1 0
5 6 1 2018-07-01 09:49:20.000000 1 0

Notice how all the times are the same--that shouldn't be the case.

scolby33 commented 6 years ago

Here's my next shot

SELECT r.id, r.chain_id, r.location_id, r.retrieved, r.ping, r.ocsp
FROM (
  SELECT responder.id AS resp_id, location.id AS loc_id, MAX(result.retrieved) as most_recent
  FROM result
  JOIN chain ON chain.id = result.chain_id
  JOIN responder ON responder.id = chain.responder_id
  JOIN location ON location.id = result.location_id
  GROUP BY resp_id, loc_id
) AS x
JOIN result AS r ON r.retrieved = x.most_recent
JOIN chain ON chain.id = r.chain_id
JOIN responder ON responder.id = chain.responder_id AND responder.id = x.resp_id
JOIN location ON location.id = r.location_id AND location.id = x.loc_id

What do you think @cthoyt? Feel free to translate to SQLalchemy if you want, too.

scolby33 commented 6 years ago

Fixed with d8e463c and subsequent commits.