Open hottwaj opened 1 year ago
Here's a bit of code that will do that. I had considered adding this into my PR #156 but that PR seems to have stalled.
If there's an interest in adding this functionality, I don't mind submitting a separate PR, @palewire
First, need to subclass CopyMapping:
from postgres_copy import CopyMapping
class ResultsCopyMapping(CopyMapping):
def insert_suffix(self) -> str:
"""Add `RETURNING` clause to get newly created/updated ids."""
suffix = super().insert_suffix()
suffix = suffix.split(';')[0] + ' RETURNING id;'
return suffix
def post_insert(self, cursor) -> None:
"""Extend to retrieve results from `RETURNING` clause."""
self.obj_ids = [r[0] for r in cursor.fetchall()]
Unfortunately, I don't believe it's possible to implement this "ResultsCopyMapping" the "normal way" (e.g., by putting objects = CopyManager()
on your model), so it has to be called explicitly by subclassing CopyQuerySet
:
from postgres_copy import CopyQuerySet
class ResultsCopyQuerySet(CopyQuerySet):
def from_csv(self, csv_path_or_obj, mapping=None, **kwargs):
mapping = ResultsCopyMapping(self.model, csv_path_or_obj, mapping=None, **kwargs)
count = mapping.save(silent=True)
objs = self.model.objects.filter(id__in=mapping.obj_ids)
return objs, count
Then, finally, on your model you do
class MyModel(models.Model):
[...]
objects = ResultsCopyQuerySet.as_manager()
This code is slightly different than the implementation I have that's working for me, so let me know if you have any issues with it!
thanks! That's really helpful :)
If an API is added for this, my vote would be for that to be able to return either i) raw IDs (i.e. avoid the SELECT in your code above) or ii) full ORM instances (as in your example)
Also I think option ii) could be made more efficient by selecting the last N rows in the table, where N is the number of rows inserted, rather than doing the selection by id__in
as in your code? would need the SELECT to be within same transaction and ordering to be by PK for that to work...
Definitely! In my case, just grabbing the last N
rows in the table wouldn't work because I'm using code that allows "upsert," so it might not always be the case that the rows imported via CSV match the most recent rows in the db.
If you just want to get the ids, just change from_csv
to
def from_csv(self, csv_path_or_obj, mapping=None, **kwargs):
mapping = ResultsCopyMapping(self.model, csv_path_or_obj, mapping=None, **kwargs)
count = mapping.save(silent=True)
return mapping.obj_ids
aha thanks I did not consider how the "last N" would interact with "upsert" :)
hope some progress on that PR can made too
What would the PR be? Adding a post_insert hook?
What would the PR be? Adding a post_insert hook?
I'm not 100% on the best implementation API-wise. The necessary hooks are all already there (insert_suffix
and post_insert
), but I suppose we could add a kwarg to from_csv
, something like returning
, that would allow the user to specify count
(default), ids
, or objs
(although I'm not sure if it's frowned upon to add kwargs that accept a limited number of expected string values).
Gotcha. If this is simply a common extra bit to slap into the hook, I don't think it needs to be integrated into the repo. Sharing the snippet seems like enough. I'd be happy to include the example in the docs as a cookbook style snippet.
Yeah, I agree that's probably the best approach. I don't mind starting a PR on it.
What do you think about appending it to the "Extending with hooks" section?
I understand if you would prefer not to maintain this in the repo (e.g. due to time constraints, not being core functionality etc).
my opinion (which you are welcome to ignore :wink: ) is that as this functionality is provided by django out of the box it would make this repo stronger by doing the same instead of providing a harder-to-find code snippet containing SQL manipulation...
as I said just my opinion - obviously just trying to change your mind - and understand if you decide against. Thanks!
Can you point me to where Django implements it? If we can include something that 100% matches a public API in Django I would reconsider.
django's QuerySet.bulk_create
does something similar to the snippet @geoffrey-eisenbarth first provided: it returns the inserted objects with their PK set to the value in the database. See https://docs.djangoproject.com/en/4.2/ref/models/querysets/#bulk-create
Actually the django docs are not very clear that the PK is set on the returned objects, but it is :)
Relevant django ticket for this is here: https://code.djangoproject.com/ticket/19527
Thanks!
Would it be possible to provide an option to return PKs of inserted objects?
Given that INSERT is used to put data into the destination table after COPY to the temporary table, it looks like it should be possible via a "RETURNING id" (or whatever the PK field name is)?
Thanks! :)