zillow / ctds

Python DB-API 2.0 library for MS SQL Server
MIT License
83 stars 12 forks source link

Bulk_Insert with Python list iterator leaves me with "String or binary data would be truncated." #48

Closed hasuino closed 5 years ago

hasuino commented 5 years ago

I've tried to bulk insert data into an SQL Server (2016) database. The source of the data is a text file with fixed length strings. Therefore I split each string and keep all data of one row in a list like this:

['20181201', 'I', ' 14502', '2 ', '19.12.2018', '1 ', '0 ', 'ALR ', ' 16329', ' 80', '1261', ' 80', '370 ', '80 ', 'P', '60 ', '1200 ', 'N', ' ', ' ', ' ', ' 1', ' ', '2018-12-01-04.19.03.000000']

Then I put each of these (line) lists into another list so I can traverse these data with the built-in function "for in". This worked pretty well with pypyodbc, but that's also pretty slow when inserting one million rows of data. To be able to use the bulk_insert of ctds I simply thought all I'd need is an iterator, so I did the following:

alldataiterator = alldatainfile.iter() and later on: datacnxn.bulk_insert (schema+'.'+table, alldataiterator)

As a summary: I build a list containing lists (of line/row data), put an iterator on that list and tried the bulk insert, but then I immediately get this error message from the database:

Traceback (most recent call last): File "data_pump.py", line 189, in datacnxn.bulk_insert (schema+'.'+table, alldataiterator) _tds.DataError: String or binary data would be truncated.

Unfortunately I didn't get the necessary information from docs to trace this error. Maybe my method of simply adding an iterator to the (outer) list is not the best of I've to do some more steps to get the list data into a proper format for this kind of bulk_insert.

What do you think, is there a solution for this problem? Thanks in advance, Martin

joshuahlang commented 5 years ago

I suspect this may be related to FreeTDS' internal handling of strings in the bulk insert code. Have you tried the recommendation here?

Can you post the table schema definition and data that will reproduce the error so I can try to reproduce it?

hasuino commented 5 years ago

Hi Joshuah,

sorry for the delay, I've been very busy these days. Sure, this is the table definition:

CREATE TABLE [FV].[T_AZGBSP](
    [IMPFILEDATE] [int] NULL,
    [AA-PROCESS-CD] [varchar](1) NULL,
    [AZU_ZUG_NR] [varchar](6) NULL,
    [AZU_TG_BEREICH] [varchar](2) NULL,
    [AZU_PROD_DATUM] [varchar](10) NULL,
    [AZH_HALT_SEQ] [varchar](6) NULL,
    [AZB_ZUS_LFD_NR] [varchar](6) NULL,
    [AZH_BST_ABK] [varchar](5) NULL,
    [BST_NR] [varchar](6) NULL,
    [AZH_BST_BVW_NR] [varchar](6) NULL,
    [TFB_BAUREIHE] [varchar](4) NULL,
    [BVW_NR] [varchar](6) NULL,
    [AZB_MAX_LAENGE] [varchar](6) NULL,
    [AZB_MAX_GESCHW] [varchar](6) NULL,
    [AZB_BREMS_ART] [varchar](1) NULL,
    [AZB_BREMS_MBRH] [varchar](6) NULL,
    [AZB_MAX_LAST] [varchar](6) NULL,
    [AZB_GRENZ_LAST_FLG] [varchar](1) NULL,
    [AZB_ZBESP_ART] [varchar](1) NULL,
    [AZB_POS] [varchar](1) NULL,
    [AZB_TFZ_EVU] [varchar](4) NULL,
    [AZB_TFZ_BR_UNR] [varchar](2) NULL,
    [AZB_EVU_TFZ_STELL] [varchar](4) NULL,
    [AZB_TIMESTAMP] [varchar](26) NULL
) ON [PRIMARY]

Finally your advice did all the trick. I'd to convert the strings in following manner:

            try:
                ctdsdata = ctds.SqlVarChar(mydata.decode('utf-8').encode('latin-1', 'ignore'))
            except:
                print ("### ERROR (unknown) during conversion of string: " + mydata)

There were some characters which could not be converted, therefore I used the "ignore" flag to avoid any data from not being imported. Now my script works and the bulk insert takes only seconds to import thousands of rows of data 👍 Thanks for your help, best Regards, Martin

hasuino commented 5 years ago

Hi all,

only last thing which does not work out is: When BULK_INSERT needs 150 seconds, it stops working and throws following error message:

    datacnxn.bulk_insert (schema+'.'+table, alldataiterator)
_tds.DatabaseError: Adaptive Server connection timed out

This occurs for one big table only, all other tables work well with this BULK_INSERT (and very quickly).

Is there any default value of 150 seconds on the way between my script and the database connection (some library I simply don't see which might implicitly be included when using ctds)?

joshuahlang commented 5 years ago

If you're trying to insert large amounts of data you may need to increase the connection's timeout property