Open jamesturk opened 8 years ago
Maybe! Just to make sure I understand, the idea is to insert data into an existing table managed by a Django model?
Yep, the way I have it working is that you instantiate a bunch of objects of the model type. It'd look something like:
entries = []
for data in data_from_some_source():
# by instantiating Entry here we get a few things like proper defaults
entries.append(Entry(**item))
# slow way
Entry.objects.bulk_create(entries)
# fast way
CopyMapping(model=Entry, data=entries).save()
& internally it'd essentially create a temporary CSV file w/ all of the data then load it via the same COPY mechanism.
Let's do it!
On Fri, Jul 8, 2016, 9:17 PM James Turk notifications@github.com wrote:
Yep, the way I have it working is that you instantiate a bunch of objects of the model type. It'd look something like:
entries = [] for data in data_from_some_source():
by instantiating Entry here we get a few things like proper defaults
entries.append(Entry(**item))
slow way
Entry.objects.bulk_create(entries)
fast way
CopyMapping(model=Entry, data=entries).save()
& internally it'd essentially create a temporary CSV file w/ all of the data then loads it via the same COPY mechanism.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/california-civic-data-coalition/django-postgres-copy/issues/25#issuecomment-231514075, or mute the thread https://github.com/notifications/unsubscribe/AAAnCb64uIgiRNztsEcF5WNzf2ay7esfks5qTyDfgaJpZM4JIfOl .
Is there any progress?
I thought about this when I first came across this project a few weeks ago. If I remember correctly, the in-memory object would need a read() and readlines() method based on psycopg2's copy_expert() method (see: https://github.com/psycopg/psycopg2/blob/6da3e7ee69971cd6cb692765a4d66a5ce405f104/psycopg/cursor_type.c#L1366)
I never got a chance to come back & contribute what I had, but here it is as a POC, in case someone else wants to pick this up.
import csv
from io import StringIO
from django.db import connection
def postgres_bulk_copy(objects, ModelCls=None,
table_name=None,
fields=None, not_null=None, null=None,
include_id=False,
):
_fields = []
_not_null = []
_null = []
# if a model was passed in, use it to get defaults
if ModelCls:
for f in ModelCls._meta.get_fields():
if (not f.auto_created and
not (f.is_relation and f.many_to_many) and
f.column):
_fields.append(f.column)
if f.null:
_null.append(f.column)
else:
_not_null.append(f.column)
# prefer passed in overrides if they exist
table_name = table_name or ModelCls._meta.db_table
fields = fields or _fields
not_null = not_null or _not_null
null = null or _null
if include_id:
fields.append('id')
objects = [
{k: getattr(o, k) for k in fields} for o in objects
]
tmp = StringIO()
w = csv.DictWriter(tmp, fieldnames=fields)
w.writeheader()
w.writerows(objects)
# flush and seek to start
tmp.flush()
tmp.seek(0)
cursor = connection.cursor()
sql = "COPY {}({}) FROM STDIN WITH CSV HEADER".format(table_name,
', '.join(fields)
)
if null:
sql += " FORCE NULL {}".format(', '.join(null))
if not_null:
sql += " FORCE NOT NULL {}".format(', '.join(not_null))
cursor.copy_expert(sql, tmp)
# need this commit here so lookups immediately after will work
cursor.connection.commit()
Thanks for sharing your work, @jamesturk. If anybody wants to take a run at a pull request it would be greatly appreciated.
I think it's only partial progress towards the goal here, but I just merged #68, which adds support for submitting file objects in addition to file paths.
I wanted to check if a patch to support something like:
CopyMapping(model=Entry, data=huge_list_of_unsaved_entries)
would be a welcome patchI have some code I was considering making into a library that uses pg COPY to replace Django's bulk_create in the form of a function w/ the signature:
I was considering abstracting this into a generic library to use Postgres' COPY TO/FROM when I found this library & figured I'd check if you'd be open to expanding the scope for this use case.