pudo / dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
https://dataset.readthedocs.org/
MIT License
4.76k stars 297 forks source link

provide option for special characters in column/table names #362

Open gvoysey opened 3 years ago

gvoysey commented 3 years ago

160 is quite old, but i wanted to bring it up again because I just ran up against it.

I'm hacking around with a bibtex-to-sqlite converter to ingest mendeley- and zotero- managed bibtex files of collections of citations, and emit a database from it. Mendeley, in the great wisdom common to all the works of Elsevier, has injected fields in the bibtex entries whose names are things like mendeley-tags. This is guarded against by dataset at the moment, so we get:

File "/home/gvoysey/.cache/pypoetry/virtualenvs/bibtex-to-sqlite-uwnieG41-py3.8/lib/python3.8/site-packages/dataset/util.py", line 75, in normalize_column_name
    raise ValueError("%r is not a valid column name." % name)
ValueError: 'mendeley-tags' is not a valid column name.

I'd like to be able to pass a no_i_really_mean_it_ruin_my_schema_give_me_the_footgun flag in this case to allow column IDs to be created that will subsequently be required to be quoted in SELECTs etc. I'm not sure where i'd want to pass it, but possibly in .create_table().

In my limited case, permitting weird column names is fine w/r/t dataset's table.find() method because the database i'm making will never see python again, but I understand the reluctance to expose the footgun given in #160. Still, for parity with the original data, I think it's worthwhile to have it passable. Right now i'm doing the extremely hacky thing of:

val = mendeley_entry.pop('mendeley-tags')
mendeley_entry['mendeleytags'] = val

and that's a little distasteful.