Open LLyaudet opened 1 year ago
Some solution is for example:
def get_modification_lists(objects_list1: Iterable, objects_list2: Iterable, pairing_field: str):
one_only_list = []
both_list = []
two_only_list = []
objects_dict1 = {getattr(some_object, pairing_field): some_object for some_object in objects_list1}
objects_dict2 = {getattr(some_object, pairing_field): some_object for some_object in objects_list2}
keys = set(objects_dict1.keys()) | set(objects_dict2.keys())
for some_key in keys:
if objects_dict1.get(some_key) is None:
two_only_list.append(objects_dict2[some_key])
elif objects_dict2.get(some_key) is None:
one_only_list.append(objects_dict1[some_key])
else:
both_list.append((objects_dict1[some_key], objects_dict2[some_key]))
return one_only_list, both_list, two_only_list
I did not test this code. I just coded it directly in the issue comment. But you get the idea :)
Until now I did some adhoc code every time I had to do this, but clearly it can be factorized. I would be very happy if you find time to improve my function above, and even more happy if you would consider to add this kind of function in toolz :)
To complete the example, you can do:
wanted_state_objects = MyModel.objects.filter(site_id=1)
existing_state_objects = MyModel.objects.filter(site_id=2)
to_create_objects, to_update_objects, to_delete_objects = get_modification_lists(
wanted_state_objects,
existing_state_objects,
"name",
)
for some_object in to_create_objects:
some_object.site_id = 2
some_object.id = None
MyModel.objects.bulk_create(to_create_objects)
MyModel.objects.bulk_delete(to_delete_objects)
true_updates = []
for wanted_object, existing_object in to_update_objects:
# The code in this loop should go in a method of the model
is_modified = False
if existing_object.some_parameter_interesting_field1 != wanted_object.some_parameter_interesting_field1:
existing_object.some_parameter_interesting_field1 = wanted_object.some_parameter_interesting_field1
is_modified = True
if existing_object.some_parameter_interesting_field2 != wanted_object.some_parameter_interesting_field2:
existing_object.some_parameter_interesting_field2 = wanted_object.some_parameter_interesting_field2
is_modified = True
# ...
if is_modified:
true_updates.append(existing_object)
# You can enhance this with updated_fields
MyModel.objects.bulk_update(true_updates)
This way you can truly minimize database load :)
This looks a lot like different types of joins. pytoolz has a join method (https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.join)
If I understand your description, doing a full outer join and inspecting the result will tell you which operation should be performed.
from toolz import join, identity
a = list(range(10))
b = list(range(5, 15))
for el1, el2 in join(identity, a, identity, b, left_default=None, right_default=None):
if el1 is None:
# These are elements that appear in b, but not in a
elif el2 is None:
# These are elements that appear in a, but not in b
else:
# These are elements that appear in both a and b
You would want to replace the identity function with attrgetter to get the appropriate attribute off each object for comparison.
Thank you very much for your answer @groutr . I checked the doc and corrected a typo: https://github.com/pytoolz/toolz/pull/568 It is interesting to see it as a full outer join :) Nevertheless, it doesn't factorize the code enough since you want the three lists for bulk database modification. With full outer join, you replace:
def get_modification_lists(objects_list1: Iterable, objects_list2: Iterable, pairing_field: str):
one_only_list = []
both_list = []
two_only_list = []
objects_dict1 = {getattr(some_object, pairing_field): some_object for some_object in objects_list1}
objects_dict2 = {getattr(some_object, pairing_field): some_object for some_object in objects_list2}
keys = set(objects_dict1.keys()) | set(objects_dict2.keys())
for some_key in keys:
if objects_dict1.get(some_key) is None:
two_only_list.append(objects_dict2[some_key])
elif objects_dict2.get(some_key) is None:
one_only_list.append(objects_dict1[some_key])
else:
both_list.append((objects_dict1[some_key], objects_dict2[some_key]))
return one_only_list, both_list, two_only_list
with
def get_modification_lists(objects_list1: Iterable, objects_list2: Iterable, pairing_field: str):
one_only_list = []
both_list = []
two_only_list = []
for object1, object2 in join(
lambda x: getattr(x, pairing_field),
objects_list1,
lambda x: getattr(x, pairing_field),
objects_list2,
left_default=None,
right_default=None,
):
if object1 is None:
two_only_list.append(object2)
elif object2 is None:
one_only_list.append(object1)
else:
both_list.append((object1, object2))
return one_only_list, both_list, two_only_list
But I think join is the right track for naming a new function. both_list corresponds to inner join. one_only_list corresponds to left outer join "minus" inner join. two_only_list corresponds to right outer join "minus" inner join. Something like
LEFT_OUTER_JOIN_MINUS_INNER_JOIN = (1,)
INNER_JOIN = (2,)
RIGHT_OUTER_JOIN_MINUS_INNER_JOIN = (3,)
LEFT_OUTER_JOIN = (1,2)
RIGHT_OUTER_JOIN = (2,3)
FULL_OUTER_JOIN = (1,2,3)
ANTI_JOIN = (1, 3)
def join_synthesis(
leftkey,
leftseq,
rightkey,
rightseq,
joins_seq,
):
....
And my use case could be:
one_only_list, both_list, two_only_list = join_synthesis(
lambda x: getattr(x, pairing_field),
objects_list1,
lambda x: getattr(x, pairing_field),
objects_list2,
joins_seq=(LEFT_OUTER_JOIN_MINUS_INNER_JOIN, INNER_JOIN, RIGHT_OUTER_JOIN_MINUS_INNER_JOIN),
)
It would be nice :) Thanks :)
Hello, I'm trying to find the best way to modify the database when I want to copy some object parametering from one site to one site in Django. Let's assume I have a model
The ids and site_ids are distinct between the two sites. However, I would like to pair objects by name and do only the database modifications that are required. What is the best way to do this using toolz API, please ? I will detail some solutions I have found later. Best regards, Laurent Lyaudet