wintersrd / pipelinewise-tap-mssql

Pipelinewise tap for Microsoft SQL Server
GNU Affero General Public License v3.0
14 stars 47 forks source link

Table Select is not being taken correctly. #74

Closed SenneVanstraelen closed 1 week ago

SenneVanstraelen commented 1 month ago

Hello

We are connecting with a SQL Server 2005 (TDS = 7.0) system with this tap. We have a select which follows the following pattern:

companyName countryName$Store.*

When running the tap, we get the message that the stream: companyName countryName tore does not exist.

It seems that the $ sign is replacing values with nothing where it shouldn't. Is there a way to fix this in the yml file so it doesn't have this behaviour anymore?

Thanks

s7clarke10 commented 1 month ago

Have you tried escaping the special $ character e.g. companyName countryName\$Store.*

https://www.geeksforgeeks.org/how-to-escape-the-special-character-in-yaml-with-examples/

SenneVanstraelen commented 1 month ago

Thanks for your reply.

This is one of the first things that we tried. It doesn't work. Then we get a search for Stream: companyName countryName \tore. Which off course does not exist.

I believe the issue does not lie in the .yml parsing of the file. We are connecting with an azure sql database as well and this works fine with the $ in the names of the tables. The fault lies somewhere in the fact that we are connecting to SQL Server 2005 and using TDS "7.0".

We have to reconcile both systems however, and an upgrade of the TDS version is not even able to connect to the database. I have noticed that the meltano selection wildcards in the name also don't seem to work, only the .* for all columns, but not for selecting names (I tried with a '?' instead of a '$'). It seems to take it as a literal string.

s7clarke10 commented 1 month ago

Hmm, okay if it works for an azure database but not for SQL Server 2005 it makes it quite hard.

The driver library is pymssql which in turn utilitise freetds. The issue maybe in either of those packages and is likely not a direct issue with tap-mssql. I suspect freetds 7.0 does not support this.

SenneVanstraelen commented 1 month ago

Yes, I thought the same thing. But then I tested with a clean python script where I use pymssql with tds 7.0, and selecting the table and retrieving the results using a query works like a charm.

Any other suggestions I might try? Other option is to use an ODBC driver with a different tap, but then I need to find one that manages to connect to SQL Server 2005 successfully.

HaydenNess commented 2 weeks ago

I am experiencing a similar issue when running Meltano.

The following seems to work okay in the case of FULL_TABLE replication, but not in the case of LOG_BASED replication. We are using SQL Server 2014 and 2016 at the moment, but expect that range to increase.

My table name is "MyTableName+", for which I receive the following error.

Traceback (most recent call last): 
  File "src/pymssql/_pymssql.pyx", line 447, in pymssql._pymssql.Cursor.execute 
  File "src/pymssql/_mssql.pyx", line 1125, in pymssql._mssql.MSSQLConnection.execute_query 
  File "src/pymssql/_mssql.pyx", line 1156, in pymssql._mssql.MSSQLConnection.execute_query 
  File "src/pymssql/_mssql.pyx", line 1289, in pymssql._mssql.MSSQLConnection.format_and_run_query 
  File "src/pymssql/_mssql.pyx", line 1855, in pymssql._mssql.check_cancel_and_raise 
  File "src/pymssql/_mssql.pyx", line 1901, in pymssql._mssql.raise_MSSQLDatabaseException 
pymssql._mssql.MSSQLDatabaseException: (102, b"Incorrect syntax near '+'.DB-Lib error message 20018, severity 15:\nGeneral SQL Server error: Check messages from the SQL Server\n") 

During handling of the above exception, another exception occurred:

Traceback (most recent call last): 
  File "/Users/hayden.ness/groke/.meltano/extractors/source-test_testcompanysb2/venv/bin/tap-mssql", line 8, in <module> 
    sys.exit(main())           
             ^^^^^^            
  File "/Users/hayden.ness/groke/.meltano/extractors/source-test_testcompanysb2/venv/lib/python3.11/site-packages/tap_mssql/__init__.py", line 795, in main 
    raise exc                  
  File "/Users/hayden.ness/groke/.meltano/extractors/source-test_testcompanysb2/venv/lib/python3.11/site-packages/tap_mssql/__init__.py", line 792, in main 
    main_impl()                
  File "/Users/hayden.ness/groke/.meltano/extractors/source-test_testcompanysb2/venv/lib/python3.11/site-packages/tap_mssql/__init__.py", line 781, in main_impl 
    do_sync(mssql_conn, args.config, args.catalog, state) 
  File "/Users/hayden.ness/groke/.meltano/extractors/source-test_testcompanysb2/venv/lib/python3.11/site-packages/tap_mssql/__init__.py", line 749, in do_sync 
    sync_non_cdc_streams(mssql_conn, non_cdc_catalog, config, state) 
  File "/Users/hayden.ness/groke/.meltano/extractors/source-test_testcompanysb2/venv/lib/python3.11/site-packages/tap_mssql/__init__.py", line 693, in sync_non_cdc_streams 
    do_sync_historical_log(mssql_conn, config, catalog_entry, state, columns) 
  File "/Users/hayden.ness/groke/.meltano/extractors/source-test_testcompanysb2/venv/lib/python3.11/site-packages/tap_mssql/__init__.py", line 591, in do_sync_historical_log 
    log_based.sync_historic_table(mssql_conn, config, catalog_entry, state, columns, stream_version) 
  File "/Users/hayden.ness/groke/.meltano/extractors/source-test_testcompanysb2/venv/lib/python3.11/site-packages/tap_mssql/sync_strategies/log_based.py", line 230, in sync_historic_table 
    common.sync_query(         
  File "/Users/hayden.ness/groke/.meltano/extractors/source-test_testcompanysb2/venv/lib/python3.11/site-packages/tap_mssql/sync_strategies/common.py", line 229, in sync_query 
    cursor.execute(select_sql, params) 
  File "src/pymssql/_pymssql.pyx", line 462, in pymssql._pymssql.Cursor.execute 
pymssql.exceptions.ProgrammingError: (102, b"Incorrect syntax near '+'.DB-Lib error message 20018, severity 15:\nGeneral SQL Server error: Check messages from the SQL Server\n") `

I've tried both TDS 7.0 and 7.3, as well as various combinations of escaped ", and []'s in the catalog.
s7clarke10 commented 2 weeks ago

Hello, thankyou for you feedback on this. This log was helpful for me to understand the cause of the issue.

I have made the following changes but cannot verify them as my Docker Image of MS SQL Server does not support CDC.

Could you @HaydenNess and @SenneVanstraelen verify by executing a running using this PR to verify that it resolves your issue please.

e.g. If you use Meltano you can adjust your meltano.yml file to look like this to pull from the PR.

  - name: tap-mssql
    namespace: tap_mssql
    pip_url: git+https://github.com/wintersrd/pipelinewise-tap-mssql@feature/escape_log_based_tables
    executable: tap-mssql
    capabilities:
    - catalog
    - discover
    - properties
    - state
    settings:
    - name: host
      kind: string
    - name: port
      kind: string
    - name: user
      kind: string
    - name: password
      kind: password
    - name: database
      kind: string
    - name: filter_dbs
      kind: string
    - name: use_date_datatype
      kind: boolean
    - name: characterset
      kind: string
    - name: use_singer_decimal
      kind: boolean
    - name: cursor_array_size
      kind: integer
s7clarke10 commented 1 week ago

Hi @SenneVanstraelen,

As we have discussed we haven't been able to replicate this issue on MSSQL Server.

I believe the underlying issue sits in FreeTDS. I have added a new setting which you may wish to utilised called "enable_tds_logging": true . This enable_tds_logging will throw out a lot of information about why setting the value to TDS 7.3 is failing. Having done some reading on this today, it maybe related to encryption. Feel free to look in the FreeTDS github repo. Apparently TDS 7.0 allows a non encrypted connection. All other versions higher that 7.0 are encrypted. There was suggestions about changing the encryption setting in TDS config file to go to a higher version of TDS.

Secondly I wonder if there is a strange characterset issue causing this issue. Have you looked at the Characterset for your SQL Server 2005 server? I believe the default is UTF-8. Just guessing, wondering if the disappearing character in the table name is related to the Characterset - it is a bit of a guess so I'm not holding my breath.

SenneVanstraelen commented 1 week ago

Hello Steve

Thanks for the update.

After some more testing, I have found the problem. It is related to Meltano replacing $S (only if it is a capital) with an environment variable that doesn't exist. We only have the issue with the connection to SQL Server 2005, because the other db's we connect to don't have capitals right next to the $ sign.

The point you mention about encryption might be correct. I doubt that encryption is setup on this SQL Server 2005, as it is an old system.

Thanks again for all the hard work and testing.