Closed nandaccio closed 2 years ago
I had a look at the SQL code generated by MergeAllAsync, would it be possible that the qualifier (Users_ID) is not populated correctly under some circumstances?
It might be that the script you showed is for Merge
not MergeAll
. The MergeAll
should accept batch scripts unless you pass single entry on the IEnumerable<T>
.
Question: Does the operation MergeAll
is failing as well?
The original issue is on the BulkMerge
and I think it might be on the post-sql script that we do execute after bulk-inserting into the intermediary table.
That's what the tracer captured for:
So yes, it's failing for both BulkMergeAsync and MergeAllAsync. It's weird, because I am using merge operations from RepoDB in scenarios that are much more complex.
It is failing for the same reason also with:
USE [Test]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[UserLookup](
[id] [int] NOT NULL,
[rn_user_id] [int] NOT NULL,
[tl_user_id] [int] NOT NULL,
[status] [bit] NOT NULL,
[comment] [varchar](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
[Table("[dbo].UserLookup")]
public class UserLookup
{
[Map("id")]
public int Id { get; set; }
[Map("rn_user_id")]
public int InternalUserID{ get; set; }
[Map("tl_user_id")]
public int ExternalUserID{ get; set; }
[Map("status")]
public bool Status { get; set; }
}
The BulkMerge
will create such script of the data of the users
argument is only 1. The one you showed is correct then.
In relation to the issue, this is very a unique one which we never experience yet. Can we ask you to remove the qualifiers and test if it is still failing? (See below)
count = await connection.MergeAllAsync(users, hints: <YourHints>, trace: <YourTracer>, transaction: <YourTransaction>, cancellationToken: <YourCancellationToken>);
By default, the library knows that the Users_Id
column is the one to be used for qualifier as it is defined as the PRIMARY KEY CLUSTERED
.
I am trying to eliminate different scenarios, and thanks for reporting this issue.
I removed the qualifier:
This is the SQL code generated by MergeAllAsync:
And this is the error message:
The only reference to Users_ID in both the WHEN NOT MATCHED THEN and WHEN MATCHED branches is:
OUTPUT INSERTED.[Users_ID] AS [Id], @__RepoDb_OrderColumn_0 AS [OrderColumn];
So is it possible that no value is actually passed for Users_ID (hence it's seen as "NULL")?
NOTE: from the error message, it seems that the UPDATE fails. The table does not contain a field named Id (PK = Users_ID, but Users_ID was not defined as an identity by who designed the schema), could this be the problem?
I think, the cause of the problem here was because of the column projection problem (elimination). The column Users_ID
is not an identity but that has been eliminated by RepoDB on the insertion. That is completely a bug on this regards.
I could not see why the library removed the Users_ID
on the insertable fields (see here) since it is not an identity one. The code says, it is not removing it, but we need to investigate it.
Thank you for your support, as always!
Take your time, because actually it looks like a bug, Users_ID should be among the fields inserted/updated by the merge methods. You might want to try to reproduce the issue on your side.
I also tried by replacing the logic based on merge with a direct updates+inserts, despite the fact that Users_ID is available:
I am still getting an error on INSERT:
Microsoft.Data.SqlClient.SqlException: 'Cannot insert the value NULL into column 'Users_ID', table 'dbo.Users'; column does not allow nulls. INSERT fails. The statement has been terminated.'
So I enabled tracing via await target.InsertAsync(u, trace: LogFactory.CreateTracer()); and this is what I can see:
"INSERT INTO [dbo].[Users] ( [Users_USERNAME], [Users_PASSWORD], [Users_NAME], [Users_SURNAME], [Users_MAIL], [Users_EXTENSION], [Users_SITE_ID], [Users_EXTERNAL]) VALUES ( @Users_USERNAME, @Users_PASSWORD, @Users_NAME, @Users_SURNAME, @Users_MAIL, @Users_EXTENSION, @Users_SITE_ID, @Users_EXTERNAL ) ; SELECT CONVERT(INT, SCOPE_IDENTITY()) AS [Result] ;"
It's not clear why the parameters are null, while it's trying to insert a valid record (with Users_ID = 1089, as shown above).
NOTES:
Exactly, that's what I am trying to say earlier. When I tried to skim the code earlier (via GH), it is not eliminating the PK (only the Identity column). I do not why in your case the Users_ID
is being eliminated from the insertable fields even though it is not tagged as an identity column. This needs an investigation on our side.
In relation to the log.Parameter
, it is not always a collection. It can be any of the following (ParameterValue, TEntity, Dictionary<string, object>, ExpandoObject, dynamic and/or the equivalent IEnumerables) depends on the passed parameter. The reason why it is null in your case is because you type cast it to ICollection
where the parameter is not a collection, but instead, a single instance of an object (.NET CLR Type).
Thank you Mike,
I was confirming that USERS_ID is actually missing from the INSERT statement (and also from MergeAllAsync/BulkMergeAsync), exactly as you were suggesting earlier. On the other hand "await target.UpdateAsync(u);" is working as expected.
Let me know if you should need any details from my side during your investigation! I will change the tracer class to support more types for log.Parameter.
The information is enough for us to proceed with the fix. We will provide an update to you soon.
@nandaccio Could you create a small test project based on your example that reproduces the error?
Hi,
I created a simple test project, I also included the SQL CREATE script for the DB tables. But I would prefer to avoid to attach it here, can I send it either via e-mail or a private message on Github?
@nandaccio yes, of course
Thanks Sergei,
I just sent an e-mail to Michael because I can't find your e-mail address.
First, I apologize for being absent for the last 2 weeks. I have my laptop repaired and it was exactly 2 weeks before I have it at hand. I just got it as of writing this and is currently ramping up. It is a different story why I did not proceed buying my desired HP Spectre, I am currently using the powerful version of 4K Dell XPS 15, that's why 😄
Anyway, I am attaching the project @nandaccio created here.
Thank you,
I don't know if it might be useful to know, but I tested it on Windows 10 20H2 (build 19042.1348) and SQL Server v11.0.7507.2. The project targets .NET Framework 4.7.2 (after being "downgraded" from .NET Framework 4.8).
Can you reproduce this issue on your end after re-creating the DB tables using the attached script (RepoDBTest\Tables.sql)?
This is the SQL statement generated by await target.InsertAsync(entity: u, trace: SimpleLogFactory.CreateTracer()) :
INSERT INTO [dbo].[Users] ( [Users_USERNAME], [Users_PASSWORD], [Users_NAME], [Users_SURNAME], [Users_MAIL], [Users_EXTENSION], [Users_SITE_ID], [Users_EXTERNAL], [Users_USERTYPE_ID], [Users_WEEKLY_GROSS_HOURS], [Users_Status], [Users_Reg_Date], [Users_Notes], [Users_PRIMARY_DEPARTMENT_ID], [Users_FirstWeekDay], [Users_GCSDebug], [Users_LastLogon], [Users_GEMS], [Users_Localcode], [Users_LastLogon_app], [Users_GROSS_IsCalculated], [Users_LastPswMod] ) VALUES ( @Users_USERNAME, @Users_PASSWORD, @Users_NAME, @Users_SURNAME, @Users_MAIL, @Users_EXTENSION, @Users_SITE_ID, @Users_EXTERNAL, @Users_USERTYPE_ID, @Users_WEEKLY_GROSS_HOURS, @Users_Status, @Users_Reg_Date, @Users_Notes, @Users_PRIMARY_DEPARTMENT_ID, @Users_FirstWeekDay, @Users_GCSDebug, @Users_LastLogon, @Users_GEMS, @Users_Localcode, @Users_LastLogon_app, @Users_GROSS_IsCalculated, @Users_LastPswMod ) ; SELECT CONVERT(INT, SCOPE_IDENTITY()) AS [Result] ;
Note: I also tried to pass ALL the fields using Field.Parse and the fields: argument, but the "ID" fields are still missing from the INSERT statements.
GetCommandText:
private static string GetCommandText()
{
return @"
SELECT C.COLUMN_NAME AS ColumnName
, CONVERT(BIT, COALESCE(TC.is_primary, 0)) AS IsPrimary
, CONVERT(BIT, COALESCE(TMP.is_identity, 1)) AS IsIdentity
, CONVERT(BIT, COALESCE(TMP.is_nullable, 1)) AS IsNullable
, C.DATA_TYPE AS DataType
, CONVERT(INT, COALESCE(TMP.max_length, 1)) AS Size
, CONVERT(TINYINT, COALESCE(TMP.precision, 1)) AS Precision
, CONVERT(TINYINT, COALESCE(TMP.scale, 1)) AS Scale
FROM INFORMATION_SCHEMA.COLUMNS C
OUTER APPLY
(
SELECT 1 AS is_primary
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU
LEFT JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS TC
ON TC.TABLE_SCHEMA = C.TABLE_SCHEMA
AND TC.TABLE_NAME = C.TABLE_NAME
AND TC.CONSTRAINT_NAME = KCU.CONSTRAINT_NAME
WHERE KCU.TABLE_SCHEMA = C.TABLE_SCHEMA
AND KCU.TABLE_NAME = C.TABLE_NAME
AND KCU.COLUMN_NAME = C.COLUMN_NAME
AND TC.CONSTRAINT_TYPE = 'PRIMARY KEY'
) TC
OUTER APPLY
(
SELECT SC.name
, SC.is_identity
, SC.is_nullable
, SC.max_length
, SC.scale
, SC.precision
FROM [sys].[columns] SC
INNER JOIN [sys].[tables] ST ON ST.object_id = SC.object_id
WHERE SC.name = C.COLUMN_NAME
AND ST.name = C.TABLE_NAME
) TMP
WHERE
C.TABLE_SCHEMA = @Schema
AND C.TABLE_NAME = @TableName;";
}
is returning:
and:
In both cases the first field of the list is not included.
@SergerGood - are you currently working on this? Otherwise, I am now open to start working on this.
@mikependon I tried looking at the attached example. But I can't reproduce even on the specified SQL server version.
I noticed that in the attached example the field Users_ID is nullable. Maybe it has something to do with this
public int? Users_ID { get; set; }
Thanks @SergerGood. Interesting! I also tried to explore the code (here) and it is really not eliminating the PK (Users_Id). The code is only eliminating the identity. But no worries, I will look into this by deep code debugging.
Maybe it has something to do with this
public int? Users_ID { get; set; }
That should not the cause of the issue, it was declared as int and I changed it to int? only because (during the initial troubleshooting phase) I added the " if (users.Count > 0 && users.All(i => i.Users_ID != null))" check. This check is completely redundant, because Users_ID is never null and reverting to public int Users_ID { get; set; } does not help anyway.
The code is only eliminating the identity. But no worries, I will look into this by deep code debugging.
I re-compiled the RepoDB libraries from source by embedding the debug symbols and during the week-end I found out how RepoDB sees those fields:
https://github.com/mikependon/RepoDB/issues/987#issuecomment-982428131
select * FROM INFORMATION_SCHEMA.COLUMNS where TABLE_SCHEMA = 'dbo' and TABLE_NAME = 'Users';
FROM INFORMATION_SCHEMA.COLUMNS.xlsx
SELECT *
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU,
INFORMATION_SCHEMA.COLUMNS C
LEFT JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS TC
ON TC.TABLE_SCHEMA = C.TABLE_SCHEMA
AND TC.TABLE_NAME = C.TABLE_NAME
AND TC.CONSTRAINT_NAME = CONSTRAINT_NAME
WHERE KCU.TABLE_SCHEMA = C.TABLE_SCHEMA
AND KCU.TABLE_NAME = C.TABLE_NAME
AND KCU.COLUMN_NAME = C.COLUMN_NAME
AND TC.CONSTRAINT_TYPE = 'PRIMARY KEY'
AND C.TABLE_SCHEMA = 'dbo' and C.TABLE_NAME = 'Users';
This is what happens in GetAsyncInternal:
but "soon after" the Users_ID field is gone:
If I look at the table above, Users_ID isPrimary = 1 and isIdentity = 0, but:
@nandaccio - interesting why the Users_ID
is an identity field. We need to investigate this. We are happy if you could contribute a PR to this.
I found the root cause of this issue: DB schemas were supposed to be fully aligned, but it seems that there are slight differences.
I ran the "SELECT C.COLUMN_NAME AS ColumnName , CONVERT(BIT, COALESCE(TC.is_primary, 0)) AS IsPrimary , CONVERT(BIT, COALESCE(TMP.is_identity, 1)) AS IsIdentity <...omittted...>" query on both environments and I discovered that:
Environment 1:
Environment 2:
It looks like GetAsyncInternal was invoked only once, so I added:
// IdentityCache.Flush();
DbFieldCache.Flush();
PropertyCache.Flush();
Between reading&writing and that solved the issue!
Note: IdentityCache.Flush(); would not change the faulty behaviour, instead flushing both DbFieldCache and PropertyCache worked. Not sure if having both calls is redundant.
We are now simulating this, but we are glad that you have found the root cause of the problem (from your side). It sounds to us that this is not an issue on the library then.
Pertaining to the cache objects you mentioned, IdentityCache would help if you are to refresh the already cached identity of the model. It (together with PrimaryCache class) would definitely affect the SQL generation, but only on some cases. The possible reason why flushing the IdentityCache does not help with your scenario was because you have not defined an identity mapping and/or Identity attribute from the model, plus, you do not have an identity from the table. So flushing will not really affect the behaviour of the operations.
Flushing both the DbFieldCache and PropertyCache has a bigger extent of effect as it would regenerate the hashcode of the objects, and that would entirely affect the SQL command texts building and parameters passing.
Hope this helps explain!
Exactly, it is not an issue of the library. RepoDB is behaving as expected, considering that USERS_ID is an identity in Environment#1 , but not in Environment#2.
Considering that source schema and destination schema are very similar, but not completely identical (see the identity "qualifier") it is a very peculiar scenario.
On other hand, there might be similar scenarios where the same object is used by different connections to databases (hence they might have differences in identity/primary keys), in that case are there any other solutions to avoid conflicts between slightly different DB schemas associated to the same model?
In my case, flushing the DbField/Property caches after reading the source data allows me to write the corresponding data to the destination schema without exceptions because the fields are read again, as expected, so RepoDB is using the correct schema and it is able to generate the correct statements.
PS. Thank you for the explanation!
var users = new List<User>();
if (string.IsNullOrEmpty(sourceconnectionstring) || string.IsNullOrEmpty(targetconnectionstring))
{
throw new ArgumentException("sourceconnectionstring and targeconnectionstring must be valid connection strings");
}
using (var source = new SqlConnection(sourceconnectionstring))
{
users = (await source.QueryAllAsync<User>().ConfigureAwait(false)).ToList();
}
// FLUSH THE CACHE BEFORE WRITING TO THE TARGET DB DATA READ FROM SOURCE DB:
DbFieldCache.Flush();
PropertyCache.Flush();
if (users.Count > 0 && users.All(i => i.Users_ID != null))
{
using (var target = new SqlConnection(targetconnectionstring))
{
count = await target.BulkMergeAsync(users, qualifiers: p => new { p.Users_ID }, hints: SqlServerTableHints.TabLock, bulkCopyTimeout: 0).ConfigureAwait(false);
}
}
Thanks for helping us identifying the root cause. We will not make any further action on this.
For the model that is being used for multiple database providers, it is recommended to flush the caches (like what you did with your scenario). However, this solution will not work if the data type of the source and destination is not coerce(able) for the same property/field. This is only recommendable to the users that does not really like to create a separate model for source and destination RDBMS.
Thank you, I closed this issue as the problem is not caused by RepoDB and flushing the cache(s) makes everything works as expected.
NOTE: I don't know if it makes sense to add some details to the RepoDB documentation about the fact that it is up to developer to either clear the cache or use different models in case of differences (also very slight ones) when the same model is used for multiple database providers (in my case there was an exception, but I don't know if subtle bugs could be introduced in similar scenarios).
Bug Description
I am writing a quick&dirty method to import data from one instance of the database to another (same schema), but an exception is raised by target.BulkMergeAsync.
I am sure that there are no NULL values for Users_ID in the source data (see also the check below), but I am still getting the "Cannot insert the value NULL into column 'Users_ID'" error.
Exception Message:
Microsoft.Data.SqlClient.SqlException: 'Cannot insert the value NULL into column 'Users_ID', table 'dbo.Users'; column does not allow nulls. UPDATE fails. The statement has been terminated.'
Code:
Schema and Model:
Please share to us the schema of the table (not actual) that could help us replicate the issue if necessary.
And also the model that corresponds the schema.
Library Version:
RepoDb version="1.12.9" RepoDb.MySql version="1.1.5" RepoDb.SqlServer version="1.1.5-beta1" RepoDb.SqlServer.BulkOperations version="1.1.6-beta1"
targetFramework="net48"