pudo / dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
https://dataset.readthedocs.org/
MIT License
4.76k stars 297 forks source link

`.distinct` should return distinct values #337

Closed jfilter closed 3 years ago

jfilter commented 4 years ago

Hello @pudo, great work, all cool, but: I stumbled upon the usage of .distinct. I expected dataset to return values and not rows. I don't really see a usage for the 'raw' output of db.query. So my proposal: return an iterator over the values.

subs = tab_incidents.distinct('subdivisions')
list(map(lambda x: x['subdivisions'], subs))
pudo commented 4 years ago

How would this handle .distinct('divisions', 'subdivisions')?

jfilter commented 4 years ago

Option 1

Only return the raw values for single columns. So don't change anything for multiple columns.

Option 2

Return tuples

Option 3

New method distinct_values that returns raw values (or tuples for multiple columns)

pudo commented 3 years ago

I feel it's icky to break the API in this way. If you want to submit a PR to add a second function, e.g. distinct_items or so, it's very welcome!