gristlabs / grist-core

Grist is the evolution of spreadsheets.
https://www.getgrist.com/
Apache License 2.0
6.98k stars 308 forks source link

Blank values in matching columns of an imported CSV should always match and update existing records with same blank values #1162

Open jeffix2000 opened 1 month ago

jeffix2000 commented 1 month ago

Describe the current behavior

When trying to update an existing table from a CSV file, blank values in the file are ignored so that, in case of a record update, existing values of the record are not erased by the update (making the update capability incremental in both dimensions) - as per the doc https://support.getgrist.com/imports/#updating-existing-records.

But blank values seem to be ignored even for those columns selected as "matching fields" for merging with existing records of the target table. In that situation, the imported line in wrongly interpreted as a new record and added to the table.

Steps to reproduce

  1. Create a table with 3 columns : some label text, some random property number or text, and a trace attribute (e.g. the line $id, to be converted to static data before the import).
  2. Create some records with random label and property. Make sure to have one record duplicating the label of the first onen, but leaving the property blank; let's call it the test record.
  3. Create a CSV file including the 3 field headers and at least 1 new record : existing label, blank property, and a trace value very different from that of the existing test record.
  4. Import the CSV file. The 3 columns will be auto-mapped; define label & property as the matching condition.

Expected: the test record should be updated with the imported trace value.

Observed: the imported line is added as a brand new record.

Bonus:

  1. repeat importing the same CSV file. The same new line is added again and again.

Describe the expected behavior

Ignoring blank values for imported lines should be applied except for defined "matching fields".

An imported line with blank values in matching fields should match an existing record with the same values in matching field, including the blank ones (in this case, blank should be a positive information == None).

An imported line with blank values in non-matching fields should add or update an existing record without tampering with the occasional existing values in those fields, as is already the case.

Where have you encountered this bug?

Instance information (when self-hosting only)

Sample table to reproduce : https://docs.getgrist.com/fvzX9oY6ki8e/Test-structures/p/1.

Browser: Firefox 115.14.0esr.

dsagal commented 1 month ago

Thanks for the report! I can reproduce.

To add some details, this seems to do with treating empty values from a CSV as NULL values (rather than blanks). That is visible if you add a formula like type($A) -- if column A has a blank string value, that formula would return <class 'str'>; if it has a NULL (or None in Python), it would return <class 'NoneType'>. Storing nulls into a text column seems wrong, that may be a bug. For other columns (numbers, dates), it is correct.

The issue with matching comes from this because of SQL behavior that NULL is not considered to be equal to NULL. A potential way around it would be to construct the query as something like LEFT JOIN b ON a.field = b.field OR (a.field IS NULL AND b.field IS NULL).