zendframework / zend-db

Db component from Zend Framework
BSD 3-Clause "New" or "Revised" License
101 stars 122 forks source link

SQL Server NVARCHAR support #213

Closed alextech closed 7 years ago

alextech commented 7 years ago

To handle UTF columns (eg, store Russian characters in column) SQL server needs data type NVARCHAR.

Either need to add NVarchar class to Ddl\Column namespace, or add as an optional parameter column->setMultibyte(true) which appends N to type property.

froschdesign commented 7 years ago

NVARCHAR is a platform specific character type and not definied in the ANSI SQL-92 standard.

So the question is should we add this platform specific type?

Ocramius commented 7 years ago

This shouldn't land, IMO... Unless we want to always map char to NVARCHAR when using SQL Server (IIRC, Doctrine DBAL already does that)

alextech commented 7 years ago

platform specific character type Unless we want to always map char to NVARCHAR when using SQL Server`

Yes, it would go into appropriate decorators. Hence the second suggestion of making it into a property, so when platform specific decorator processes it, it has the ability to act on it, if needs be.

Still not appropriate? In that case I can add a warning to documentation to state if need to use multibyte, better write and execute SQL manually (or less work for me if nothing should be done!).

ezimuel commented 7 years ago

@alextech This is an interesting topic. IMO we should improve the DDL support with specific vendor types. Right now we are supporting SQL92 + some specific type like Column\Blob that is MySQL.

I don't see any reason to avoid vendor data types, like NVARCHAR of SQL Server. Moreover, it will be nice to have a way to limit the usage of a data type with the vendor (platform). For instance, if someone uses NVARCHAR with MySQL we should throw an Exception, saying that the type is specific for SQL Server.

@Ocramius, @froschdesign, @weierophinney what do you think?

alextech commented 7 years ago

I am with @Ocramius on this one. We should not have engine specific names in abstract API library. I think it confuses developers by allowing to write code that may or may not fit in deployment. Especially a big issue when writing cross platfrom apps. My previous project as all about writing installable distributable server who's codebase works the same way across any engine customer is willing to install it on. Part of my job was checking whether abstraction APIs would work uniformly across databases, and put in override where needed.

For instance, if someone uses NVARCHAR with MySQL we should throw an Exception, saying that the type is specific for SQL Server.

This would have made that work impossible. It is unrealistic to have

if (sqlserver) {
 $column = new NVARCHAR();
} else if(mysql) {
 $column = new VARCHAR();
}

For another example, this would prevent any attempt at creating migrations utility. For basic example of struggles in the wild look at https://github.com/vgarvardt/ZfSimpleMigrations/issues/3 where migrations were expected to run because developers assumed the code will be intelligent enough to do the said conversion itself.

HOWEVER, There are often database specific features that developers want to take advantage of. Right now they go into Features namespace, but only for TableGateway. So I propose for DDL we mostly stick to generic API that will be expected to be the same across all engines, especially column types. For other things they would go into features.

alextech commented 7 years ago

P.S. options handling in decorators already do somewhat OK job of overriding limitations of generic API. So if I want to write migrations utility for cross platform app, but still take advantage of native features where can, using some inheritance override to modify options where needed should work.

It is why I proposed having a multibyte flag in the column so the abstraction layer can modify that option as needed in a way most appropriate to the engine. That is, make the API more from semantic point of view, to say "i want internationalization of the column", instead of low level "nvchar". My reasoning for this, is because I did not know what nvchar was, until I ran into issue having Russian characters corrupted. I looked into how to make column support internationalization and came across term 'nvchar'.

ezimuel commented 7 years ago

@alextech I don't see any problem of abstraction to add a Ddl\Column\Nvarchar, as we did with Ddl\Column\Blob. If you want to develop an application that is SQL92 standard you should not use NVARCHAR. Of course, I don't want to write code like the one that you proposed. I was thinking to have some sort of internal check to be sure that Ddl\Column\Nvarchar should be used only with SQL Server platform. This is only an idea to prevent potential issues.

alextech commented 7 years ago

If you want to develop an application that is SQL92 standard you should not use NVARCHAR Not for SQL92 itself but best optimized for whatever platform is being installed.

My thinking could be a bit extreme, but IMO if I am thinking at low levels of db specific column types, why do I need DDL library at all. I believe the point of DDL, or any abstraction library, is to assist initial development without thinking of low level internals. Then override as needed for optimization. So here I agree with @froschdesign about not including the type itself, and take @Ocramius suggestion of adding the type internally.

alextech commented 7 years ago

Sorry for always doing extra PS or edits.

For case of Ddl\Column\Blob. It should get similar treatment. SQLServer equivalent is VARBINARY. So question is, what definition come first. SQLServer developer making app compatible with mysql will think of varbinary and cause bugs in mysql. Mysql developer will think of blob and cause bugs in postgresql/sqlserver. even though they are both correct and do exact same thing. Curious dilemma.

alextech commented 7 years ago

I propose we stick to most common naming convention. For example in forums a typical question is "what is equivalent of X in this DB engine". So in documentation we specify "for binary data use DDL class BLOB (or whatever else)" and provide matrix of what it will output to per db engine, so searching can quickly reveal what developer is looking for based on any search term.

ezimuel commented 7 years ago

@alextech I don't like the idea to have some internal magic stuff that translare VARCHAR in NVARCHAR or whatever. I think the original idea of zend-db should remain the same, a basic abstraction layer for SQL databases, without additional features more relevant for ORM projects like Doctrine. If we don't want to add vendor data types I'm fine with that. I'm closing this issue at the moment, we'll see in the future.

alextech commented 7 years ago

If we don't want to add vendor data types I'm fine with that.

Between having magic support and not having it at all, I definitely vote for having it, at least namespaced. My workflow has so far been:

Refactoring from varchar to nvarchar, blob to varbinary, whatever else, is minor in comparison. I really do not want to do the former again :weary: I can get behind using vendor specific column types. If I want fully cross platform migrations for myself, I can easily manage with strategy or overrides much better than I do now, especially in a non-ORM project like you point out.

As a compromise between others opinion and loss of functionality, maybe namespace it? (I did not realize how much favoritism mysql got, even in this issue)