kennethreitz / records

SQL for Humans™
https://pypi.python.org/pypi/records/
ISC License
7.16k stars 572 forks source link

Import from csv, excel, etc #7

Closed mlissner closed 8 years ago

mlissner commented 8 years ago

I do a lot data work which requires importing and exporting csv's. This library looks extremely useful, but it doesn't import from csv. If it did, it'd probably be a tool I used every day.

Logistically, I'm not even sure what this would look like, but if it's something possible, it'd be great!

Thanks for another great tool.

kennethreitz commented 8 years ago

very interesting! Tablib does provide that functionality, but I'm not sure how translating that into your database would work. Sounds like a bad idea to me.

timofurrer commented 8 years ago

@mlissner How would you suggest the mapping of columns etc? Probably isn't as easy as it sounds ...

ojengwa commented 8 years ago

Django already provides a similar tool (loaddata) via the management tools. I think it will be a nice feature to have especially since there is records already does data migration from DB... @timofurrer Borrowing from Django, where attribute keys map to columns or table names. Here's a sample schema dump to illustrate my point ... { "fields": { "expire_date": "2016-02-11T23:04:52.643Z", "session_data": "MDYzY2EyM2Q1Njk0YTJmYjgwNDkyZmMyMzQ5ZTQ4ZTJlYzYyY2FjMDp7Il9hdXRoX3VzZXJfaGFzaCI6IjhjMmM2ZTliZGU5NTc1YjdmMjlkMGM2ZTdiMDFlYjBkZDQ1MjEzYzciLCJfYXV0aF91c2VyX2JhY2tlbmQiOiJkamFuZ28uY29udHJpYi5hdXRoLmJhY2tlbmRzLk1vZGVsQmFja2VuZCIsIl9hdXRoX3VzZXJfaWQiOiIyNyJ9" }, "model": "sessions.session", "pk": "i0c5yiqxwr5n28em34ytkanknqm3omnj" }, ... It might imply a major rethink of the implememtaion of the data export format.

Regards

mlissner commented 8 years ago

Well, the way I'd want this to work would be something like this:

# Import a CSV, create a SQL table of the same name as the file, and 
# create columns using the headers in the first row of the CSV
rows.dataset.import_csv(file_handle)

# Import a CSV into a specific table, headers in CSV are used to map column names
rows.dataset.import_csv(file_handle, table="my_table")

# Import a CSV into a specific table, first row of CSV not a header
rows.dataset.import_csv(file_handle, table="my_table", headers=['columnA', 'columnB'])

We'd have to make some assumptions when creating tables, like that all columns are varchar if they don't already exist, but I think that'd be a very efficient way of importing a CSV with literally just one line of code.

ojengwa commented 8 years ago

My thought is not so much about the API design pattern as it is about the structure of the data dumps.

mlissner commented 8 years ago

@ojengwa I'm fine with the Django-style dumps as an export option, but this ticket is about importing CSVs. Not sure what the dump style has to do with it.

kennethreitz commented 8 years ago

This is definitely out of scope for the project, as it stands. That may change in the future, but I think that is unlikely.