pudo / dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
https://dataset.readthedocs.org/
MIT License
4.76k stars 297 forks source link

When a Uppercase was entered in column, the value could not be entered. (1.2.0) #310

Closed t-niijima-adglobe-co-jp closed 4 years ago

t-niijima-adglobe-co-jp commented 4 years ago

Hi, I'm not good at English So this is google translated content. I'm sorry if there is a wrong place.

Created an uppercase column. For version 1.1.2. I can inserted in the columns. However, no value was inserted in version 1.2.0. Is this a bug?

Below is the example code.

db = dataset.connect(DATA_DB_DSN,ensure_schema=False)
db.query("CREATE TABLE test(id int, Test varchar(10),test_two varchar(10));")
table = db["test"]
table.insert({"id":1,"Test":"test","test_two":"test"})

ver 1.1.2

MySQL [test]> select * from test;
+------+------+----------+
| id   | Test | test_two |
+------+------+----------+
|    1 | test | test     |
+------+------+----------+

ver 1.2.0

MySQL [test]> select * from test;
+------+------+----------+
| id   | Test | test_two |
+------+------+----------+
|    1 | NULL | test     |
+------+------+----------+
Macuyiko commented 4 years ago

Same issue when using SQLite

angelhodar commented 4 years ago

After 1 hour testing why this could happen, i realized the problem is in the function _sync_columns in the dataset/table.py module @t-niijima-adglobe-co-jp @Macuyiko. This function has the following code:

def _sync_columns(self, row, ensure, types=None):
        """Create missing columns (or the table) prior to writes.
        If automatic schema generation is disabled (``ensure`` is ``False``),
        this will remove any keys from the ``row`` for which there is no
        matching column.
        """
        columns = [c.lower() for c in self.columns]
        ensure = self._check_ensure(ensure)
        types = types or {}
        types = {normalize_column_name(k): v for (k, v) in types.items()}
        out = {}
        sync_columns = []
        for name, value in row.items():
            name = normalize_column_name(name)
            if ensure and name.lower() not in columns:
                _type = types.get(name)
                if _type is None:
                    _type = self.db.types.guess(value)
                sync_columns.append(Column(name, _type))
                columns.append(name)
            if name in columns:
                out[name] = value
        self._sync_table(sync_columns)
        return out

If you notice, the first line makes this:

columns = [c.lower() for c in self.columns]

If you insert a row with uppercases in the keys for the column names, it wont match the table columns as they have been converted to lowecase by this function. This is an important bug and has a very easy solution, like for example make the lowecase for the input row keys and always keep the column names lowecase, or just make the match not case sensitive. What do you think @pudo?

Hope this helps everybody :)

aborghesi commented 4 years ago

Using Dataset to convert dbf to Sqlite. Dataset creates all columns with uppercase names as that's how they appear in the dbf file. The first row of data gets inserted correctly but all subsequent rows are NULL for every column.

pudo commented 4 years ago

I've now re-worked the code for making column names seem case insensitive in dataset. This should fix the incorrect behaviour reported here and also allow operations like these:

tbl = self.db['cased_column_names']
tbl.insert({'place': 'Berlin',})
tbl.insert({'Place': 'Berlin',})
tbl.insert({'PLACE ': 'Berlin',})
assert len(tbl.columns) == 2, tbl.columns  # with `id`
assert len(list(tbl.find(Place='Berlin'))) == 3
assert len(list(tbl.find(place='Berlin'))) == 3
assert len(list(tbl.find(PLACE='Berlin'))) == 3

Going to release this as 1.2.1, please let me know if any aspect of the issue continues to occur.