pudo / dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
https://dataset.readthedocs.org/
MIT License
4.78k stars 298 forks source link

Dataset doesn't accept colum names with '.' or '-' #160

Closed mbcheikh closed 8 years ago

mbcheikh commented 8 years ago

Hi, I was refactoring an application changing calls to SQLA with ones to dataset. I noticed that dataset doesn’t accept column names like 'Sous-Famille' and all names with '.' or '-' as seen in dataset/persistence/util.py line 36:

def normalize_column_name(name):
    if not isinstance(name, string_types):
        raise ValueError('%r is not a valid column name.' % name)
    name = name.lower().strip()
    if not len(name) or '.' in name or '-' in name:
        raise ValueError('%r is not a valid column name.' % name)_
    return name

Is there a reason for this ? Majority of DBMS and ORM's accept them as quoted names include SQLA. Is it a "dataset" limitation ? Could I safely remove these 2 lines from my copy ? Thank you for your response and for your cool and usefull module. Mourad

pudo commented 8 years ago

Hi Mourad. This is indeed a limitation imposed by dataset, basically to prevent people from shooting themselves in the foot a bit (i.e. querying fields with a minus in their name requires that you know what you're doing a bit). It may be too harsh, though, and it's definitely not needed internally. You can either comment it out, or I'd also gladly accept a pull request which makes this optional (e.g. via an env var).

pudo commented 8 years ago

I've thought about this a bit, and I'd rather not allow "creative" column names. Not only would it break the table.find() function, it would also generate a database schema that is much harder to query manually.

mbcheikh commented 8 years ago

Hi Friedrich Sorry for my silence, but i was also thinking for the best solution. You're right limiting the use of special characters, but, to be consistent why not forbid +"'*/: ? In my case I have to manage a lot of already made MSAccess DB's and Dataset is great in this because i haven't to make a schema for each one. The problem is that these DB's were made by non technical french users, and "-" is a usual character in many french words (sous-section, sous-programme,..). For my immediate need I commented out these 2 lines, but an ideal solution would be to forbid all special characters at the creation of a new table or a new column (also usefull to avoid creating new columns when making typo errors). I am extensivly using Dataset in production, and table.find() is working well. I haven't reviewed enough the code to see how it could be broken.

2016-04-10 8:26 GMT+01:00 Friedrich Lindenberg notifications@github.com:

I've thought about this a bit, and I'd rather not allow "creative" column names. Not only would it break the table.find() function, it would also generate a database schema that is much harder to query manually.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/pudo/dataset/issues/160#issuecomment-207939403

entron commented 7 years ago

This limitation is really annoying. To have "." in a column name is very common, for example pandas json_normalize uses "." to indicate nested item.