microsoft / homebrew-mssql-release

Other
92 stars 31 forks source link

BCP strange behavior on mac 10.15.7 #56

Closed a087861 closed 3 years ago

a087861 commented 3 years ago

Hello, I am working on a mac (non m1) and testing the bcp functionality and noticing some odd behavior. I have used this utility almost daily on my windows machine and have not observed these issues.

In windows, everything needs to be perfect for the command to work, otherwise the error file is filled with information. I'm noticing that the mac version of BCP seems to take whatever I have and force it into the table and almost never puts info into the error file. I suspect user error but am not sure.

The file i am working with is a comma separated .txt file and the command I am running is

bcp sqlTest.dbo.z_testing IN "tests/data_for_bcp.txt" -S sqlinstance24 -T -c -b 50000 -t "," -e "tests/bcp_test_error_file.txt" -F2

The text file has headers on the first row, hence the "F2" option. The table I am directing the data into only has one column whereas the data_for_bcp.txt file has three columns. When using this command in windows, I would expect this to fail given that the text file and destination table don't have compatible columns, however, on mac, this command runs without any errors. Upon inspecting the output table, I see that an entire row from the text file is inserted into the 1 column in the destination table. This behavior makes me think that the delimiter flag isn't working as intended.

I have tried changing the delimiter in both the text file as well as in the bcp command and have continued to notice the same behavior.

The raw data looks like this: image

The data inserted into the sql table looks like this: image

Any info would be greatly appreciated!

v-chojas commented 3 years ago

What versions of bcp are you using? You can check with the -v option.

a087861 commented 3 years ago

Thanks for getting back with me. Just checked and using version: 17.7.001.1

v-chojas commented 3 years ago

Are you using the same version of BCP with Windows where you noticed this difference in behaviour?

If so, what version of the ODBC driver are you using (both for Windows and Mac)?

Note that when using BCP in character mode, the newlines in the input file need to be that of the platform - \r\n on Windows, and \n on Mac.

a087861 commented 3 years ago

The version I am using on windows is quite a bit older, unfortunately: 14.0.1000.169.

Using ODBC Driver 17 for SQL Server for mac and windows although not sure if the versions are synced up (though I suspect not and can gather that info if needed).

I used VS Code to check that I did my testing using CRLF in windows and the text file used on mac was configured as LF. Happy to test over methods as my current configurations were partially based on "I know what I'm doing" and partially based on "tinker around and see what makes it work"

v-chojas commented 3 years ago

BCP 14.x on Windows should be using ODBC Driver 13 for SQL Server. You can determine which exact version you are using by inspecting the file properties of Windows\system32\msodbcsql13.dll

Minor versions of the ODBC Driver 17 for SQL Server are almost always released simultaneously on all platforms. The version history can be found here:

https://docs.microsoft.com/en-us/sql/connect/odbc/windows/release-notes-odbc-sql-server-windows https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/release-notes-odbc-sql-server-linux-m

I will investigate this a little more and get back to you.

a087861 commented 3 years ago

Understood - a huge thanks for looking into this!

v-chojas commented 3 years ago

I tested bcp 14.0.1000.169 on Windows, and 17.7 Mac, and (excepting the newline format difference) did not see any difference in behaviour between the two --- they both insert the entire row into the single column. Are you seeing a different behaviour?

a087861 commented 3 years ago

Wow, you're right - not sure why but I was almost entirely convinced that this previously caused an issue when i tried this on windows but you're correct - the behavior is similar between the two systems. User error appears to be the cause :)

Previously I said that there were some settings that I used to get things to work so I suspect there is a better way of going about this. Ideally if I have a comma delimited file and I specify that a field terminator (in other words delimiter) is a comma then I would like for bcp to throw an error that says "hey your data is to wide for the table you're inserting into" or something to that effect.

Again, thanks for all of the work you've done on this!

v-chojas commented 3 years ago

Good to see you figured it out.

In the default character mode, bcp uses the destination table schema to determine what delimiters to look for. If the current column is not the last, it will continue reading and streaming data to the server until it finds the field delimiter; otherwise, it looks for the row delimiter. Thus if your destination table has only one column, it will only look for the row delimiter. Likewise, if the destination has 2 columns, it will put everything up to the first field delimiter into the first column, and then everything up to the row delimiter into the second. While this algorithm won't detect the case you are referring to, it is simple and allows for good performance when bulk inserting.