meltano / sdk

Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com
https://sdk.meltano.com
Apache License 2.0
99 stars 70 forks source link

bug: `start_date` from meltano.yml used even if `replication-key` column is not date-time #1677

Open BuzzCutNorman opened 1 year ago

BuzzCutNorman commented 1 year ago

Singer SDK Version

0.24.0

Is this a regression?

Python Version

3.9

Bug scope

Taps (catalog, state, etc.)

Operating System

Windows

Description

If you setup Incremental replication to use a non datetime column as a replcation-key and start_date is set in the meltano.yml then during the initial incremental run it will fail with a pyodbc conversion error.

      start_date: '2022-01-03'
      raw-tags:
        replication-method: INCREMENTAL
        replication-key: Id
{
    "type": "SCHEMA"
    , "stream": "raw-tags"
    , "schema": 
    {
        "properties": 
        {
            "Id": {"type": ["integer"]}
            , "TagName": {"type": ["string", "null"]}
            , "Count": {"type": ["integer", "null"]}
            , "ExcerptPostId": {"type": ["integer", "null"]}
            , "WikiPostId": {"type": ["integer", "null"]}
        }
        , "type": "object"
        , "required": ["Id"]
    }
    , "key_properties": ["Id"]
    , "bookmark_properties": ["Id"]
}

The issue seems to be in _write_starting_replication_value. The method doesn't currently check if the replication key coloumn is of type timestamp before assigning the start_date. I guess it could also check that the replication key is not of type numeric or integer.

Possible fix:

# Use start_date if it is more recent than the replication_key state
start_date_value: str | None = self.config.get("start_date")
if start_date_value and self.is_timestamp_replication_key:
    if not value:
        value = start_date_value
    else:
        value = self.compare_start_date(value, start_date_value)

Code

# Use start_date if it is more recent than the replication_key state
start_date_value: str | None = self.config.get("start_date")
if start_date_value:
    if not value:
        value = start_date_value
    else:
        value = self.compare_start_date(value, start_date_value)

Traceback (most recent call last):
  File "C:\Program Files\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\Scripts\tap-mssql.exe\__main__.py", line 7, in <module>
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\singer_sdk\tap_base.py", line 542, in cli
    tap.sync_all()
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\singer_sdk\tap_base.py", line 413, in sync_all
    stream.sync()
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\singer_sdk\streams\core.py", line 1194, in sync
    for _ in self._sync_records(context=context):
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\singer_sdk\streams\core.py", line 1093, in _sync_records
    for record_result in self.get_records(current_context):
  File "C:\development\buzzcutnorman\tap-mssql\tap_mssql\client.py", line 542, in get_records
    for record in conn.execute(query):
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\sqlalchemy\future\engine.py", line 280, in execute
    return self._execute_20(
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1710, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\sqlalchemy\sql\elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1577, in _execute_clauseelement
    ret = self._execute_context(
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1948, in _execute_context
    self._handle_dbapi_exception(
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\sqlalchemy\engine\base.py", line 2129, in _handle_dbapi_exception
    util.raise_(
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\sqlalchemy\util\compat.py", line 211, in raise_
    raise exception
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1905, in _execute_context
    self.dialect.do_execute(
  File "C:\development\projects\my-test-stuff\.meltano\extractors\tap-mssql\venv\lib\site-packages\sqlalchemy\engine\default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DataError: (pyodbc.DataError) ('22018', "[22018] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Conversion failed when converting the nvarchar value '2022-01-03' to data type int. (245) (SQLExecDirectW)")
[SQL: SELECT raw.tags.[Id], raw.tags.[TagName], raw.tags.[Count], raw.tags.[ExcerptPostId], raw.tags.[WikiPostId]
FROM raw.tags
WHERE raw.tags.[Id] >= ? ORDER BY raw.tags.[Id]]
[parameters: ('2022-01-03',)]
(Background on this error at: https://sqlalche.me/e/14/9h9h)
stale[bot] commented 6 months ago

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

edgarrmondragon commented 6 months ago

Still relevant

Iamcerba commented 2 months ago

Still relevant.