pudo / dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
https://dataset.readthedocs.org/
MIT License
4.76k stars 297 forks source link

How to specify UUID index [SOLVED] #361

Closed ohld closed 3 years ago

ohld commented 3 years ago

Hey 👋 and big thanks for the Library. The coolest feature is scheme autogeneration.

I had a problem with your library. Since I figured out how to solve it i don't think this is an issue but i'd like to share my solution here in case someone would need it too.

I'm building and ETL script that will just ingest data from .csv file to Postgres. To be more precise and to draw more attention to this issue from Google I'd like to say that I'm talking about Crunchbase data.

Crunchbase uses UUID as an Index for their data (which also can't be autoincremented). So I wanted specify this in dataset to automatically create the required index column in my Postgres.

I didn't find anything regardless "how to specify index column which should be used in Table creation and population. This is the solution I have, maybe it will be useful for someone. And maybe a little bit more documentation and customising methods will be created afterwards.

db = dataset.connect('postgresql://hey-hacker:pass@fff.com:3333/ohld')
table = db['crunchbase_raw_organizations']
table._primary_id = 'uuid'

from sqlalchemy.dialects.postgresql import UUID
table._primary_type = UUID
table._primary_increment = False

table.insert_many(data_to_upload)
# table.upsert_many(data_to_upload, ['uuid'])

What happened: A new table with UUID-type index created and uuid value from data is used.