pudo / dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
https://dataset.readthedocs.org/
MIT License
4.76k stars 297 forks source link

inconsistent Upsert documentation #404

Closed matecsaj closed 1 year ago

matecsaj commented 1 year ago

Upserts are a feature here: https://dataset.readthedocs.io/en/latest/

And then a limitation here: https://dataset.readthedocs.io/en/latest/quickstart.html#limitations-of-dataset

Which is it? Please make the documentation consistent.

RedContritio commented 1 year ago

the later said "Database-native UPSERT semantics", and current version combine insert and update to realize upsert, so it's a feature, but not native.

pudo commented 1 year ago

The problem is that we never really forced making a unique index for upsert key sets in the dataset Python API, which is needed for doing it in Postgres (and SQLite?). To migrate from one to the other, we'd end up implicitly creating constraints on a database, which feels like a really icky thing to do.

It's left me personally choosing to do Postgres upserts manually in a lot of my newer apps rather than using dataset :/ (e.g.: https://github.com/opensanctions/storyweb/blob/main/storyweb/logic/articles.py#L156-L164 )

matecsaj commented 1 year ago

Thanks all for remedying my confusion and making this great library available.

I'd like to suggest two documentation changes.

  1. Link all limitations to their relevant item in the issue list.

Why? Readers can learn more without bothering you with a question that already has an answer.

  1. Have two limitation lists, one functional and one performance.

Why? Readers that don't require high performance will be more likely to use your library.