thoughtworks / recce

Server-based database reconciliation tool for developers
Apache License 2.0
23 stars 3 forks source link

Fail fast or warn user if queries have mismatched column numbers #56

Open chadlwilson opened 2 years ago

chadlwilson commented 2 years ago

Context / Goal

If the two queries expressed in a dataset have different numbers of columns, they cannot possibly produce matches, when doing a hash-based comparison.

We should ideally fail fast, or at the very least warn the user clearly in the results somehow.

Currently we do not do any query parsing at startup, as this is entirely delegated to the runtime drivers for the relevant databases. We also do not want to introduce a startup connectivity dependency on an given datasource.

Expected Outcome

Evaluate and implement some approach to addressing this

Out of Scope

Additional context / implementation notes

jiawen-tw commented 2 years ago

To decide on which option would we like to go with:

Option 1: Basic parsing of SQL query

Option 2: Run whole reconciliation run

Option 3: After completing the run on source database, do the comparison with first row of target database

Option 4: Get the first element from both sources for validation, then all elements for reconciliation

Option 5: Build a separate api for validation of the query

tommi-lew commented 2 years ago

Sharing my thoughts here:

tommi-lew commented 2 years ago

Suggestion

tommi-lew commented 2 years ago

How databases supported by r2dbc specify number of records to return:

  1. Google Cloud Spanner - LIMIT. Refer to Google Standard SQL.
  2. Oracle - No.
    1. Pre-version 12c: ROWNUM together with WHERE
    2. From version 12c onwards: Row Limiting Clause
  3. H2 - LIMIT
  4. Maria DB - LIMIT
  5. MS SQL - No. Uses SELECT TOP
  6. PostgreSQL - LIMIT

References