rdagumampan / yuniql

Free and open source schema versioning and database migration made natively with .NET/6. NEW THIS MAY 2022! v1.3.15 released!
https://yuniql.io
Apache License 2.0
417 stars 63 forks source link

treatment of non-ascii characters #248

Open avengerovsky opened 2 years ago

avengerovsky commented 2 years ago

It looks like yuniql replaces non-ascii characters in my script with '�'. Example: select regexp_replace(asset_name, '[–—]', '-', 'g') as name from xxx becomes select regexp_replace(asset_name, '[��]', '-', 'g') as name from xxx

The same script executed with psql or dbeaver produces correct representation of the characters.

Thoughts?

rdagumampan commented 2 years ago

Hi @avengerovsky , thanks for reaching out and you're probably right about this. I will add some test to cover this and fix will possibly be in the next release. Im really hoping to release this week if time permits.

P.S. ICYMI, please Star our repo. Thanks!

rdagumampan commented 2 years ago

@avengerovsky , I tried to reproduce this and I think that while it prints incorrectly on console, it reads the content of the script files correctly. Here I have an script file with this script and I can see the Chinese characters are well preserved when its inserted into database.

There is problem in Console and I tried looking and it seems to have somethign to do with the Console settings than the code itself.

SELECT '苹果 (Píngguǒ)' AS Apple UNION ALL
SELECT '微软 (Wēiruǎn)' AS Microsoft UNION ALL
SELECT '三星 (Sānxīng)' AS Samsung
GO

CREATE TABLE [dbo].[test_utf16_table](
[textdata] [nvarchar](MAX) NOT NULL
);
GO

INSERT INTO [dbo].[test_utf16_table] VALUES (N'苹果 (Píngguǒ)')
INSERT INTO [dbo].[test_utf16_table] VALUES (N'微软 (Wēiruǎn)')
INSERT INTO [dbo].[test_utf16_table] VALUES (N'三星 (Sānxīng)')
GO

image

rdagumampan commented 2 years ago

@avengerovsky , I tried to reproduce this and I think that while it prints incorrectly on console, it reads the content of the script files correctly. Here I have an script file with this script and I can see the Chinese characters are well preserved when its inserted into database.

There is problem in Console and I tried looking and it seems to have somethign to do with the Console settings than the code itself.

SELECT '苹果 (Píngguǒ)' AS Apple UNION ALL
SELECT '微软 (Wēiruǎn)' AS Microsoft UNION ALL
SELECT '三星 (Sānxīng)' AS Samsung
GO

CREATE TABLE [dbo].[test_utf16_table](
[textdata] [nvarchar](MAX) NOT NULL
);
GO

INSERT INTO [dbo].[test_utf16_table] VALUES (N'苹果 (Píngguǒ)')
INSERT INTO [dbo].[test_utf16_table] VALUES (N'微软 (Wēiruǎn)')
INSERT INTO [dbo].[test_utf16_table] VALUES (N'三星 (Sānxīng)')
GO

image

rdagumampan commented 2 years ago

The log files also capture the text correctly. Can you send me a test script file I can use to reproduce the issue. Thanks. @avengerovsky image

avengerovsky commented 2 years ago

Thank you for looking into this issue. Here is how my log looks like:

image

the original sql is:

image

Now, I found a workaround by using doublebyte digital representation of these characters (en dash and em dash) and it is working fine.

image

Still I think the problem is there...

rdagumampan commented 2 years ago

Thanks, I will try to reproduce this on pgsql. The one I tested so far is on sqlserver. And good to hear you found a work around and still convinced to use yuniql :)

It should ideally work with pgdump and with out customization as you did. We'll try to do more investigation. Thanks again.