ontodev / valve.rs

A lightweight validation engine written in rust.
BSD 3-Clause "New" or "Revised" License
4 stars 0 forks source link

Be smarter about the distinction between conflict and normal rows #58

Closed lmcmicu closed 9 months ago

lmcmicu commented 11 months ago

In the make_inserts() function, the algorithm doesn't distinguish between errors in such columns that are actually caused by primary/unique/tree constraint violations as opposed to other errors in those columns. Since we have access to the cell messages in this function, we should in principle be able to change the algorithm so that only actual primary/unique/tree violations go to the conflict table rather than any error in one of those columns. Since we would like to minimize NULL values in the normal tables, this issue is to change the algorithm as specified above. (See the slack exchange from August 22, 2023 between James and Mike for the initial motivation for this issue.)

lmcmicu commented 10 months ago

I've had some success creating a new more user-friendly view called my_table_user_view which refers to my_table_view and which seems to be almost as efficient. This is the content of PR #60.

Another option is to change valve's underlying logic a bit so that values are inserted into the conflict table whenever possible. Effectively this means that they will always be inserted except for the case of errors involving primary keys, unique keys (which subsumes foreign keys) and datatypes. If we do this we could possibly get away with not having a special my_table_user_view table. Further, I've had a look at valve's code; although I was worried at first that changing this logic would break things deep inside valve, after investigating the code more closely I no longer think so. I think the change could be done relatively easily.

However the problem which motivated this issue (confusion about where to find the values for a given column) would still remain for the three types of violation just indicated, and in fact there is an argument to be made that it will be more not less confusing to the user. The reason is that the case of a NULL value in a column that actually has a value in the message table that can't be shown in the normal/conflict table will be encountered much less often and so the user might not be on the lookout for these.

Assuming that the performance of adding a special my_table_user_view is good enough, I would be inclined to solve the problem this way rather than change valve.

lmcmicu commented 10 months ago

This is a followup to my last comment. Regarding the user_view: In #60 I have implemented this so that it includes both the history and message columns. However if we don't need these columns in the user view then performance can be improved significantly.