Open Drapersniper opened 1 year ago
@Drapersniper @dantownsend I've been experimenting with preventing duplicate entries in m2m and I'm interested in your thoughts on this. I changed the _run() method to this and everything works fine and unittest pass.
async def _run(self):
rows = self.rows
unsaved = [i for i in rows if not i._exists_in_db]
if unsaved:
await self.rows[0].__class__.insert(*unsaved).run()
joining_table = self.m2m._meta.resolved_joining_table
joining_table_rows = []
for row in rows:
joining_table_row = joining_table(**self.extra_column_values)
# select all rows can be problematic as the amount of data
# in the joining_table increases
unique_joining_table_rows = set(
(
i[self.m2m._meta.primary_foreign_key._meta.name],
i[self.m2m._meta.secondary_foreign_key._meta.name],
)
for i in await joining_table.select(
self.m2m._meta.primary_foreign_key,
self.m2m._meta.secondary_foreign_key,
)
)
setattr(
joining_table_row,
self.m2m._meta.primary_foreign_key._meta.name,
getattr(
self.target_row,
self.target_row._meta.primary_key._meta.name,
),
)
setattr(
joining_table_row,
self.m2m._meta.secondary_foreign_key._meta.name,
getattr(
row,
row._meta.primary_key._meta.name,
),
)
# the new joining_table row we want to add
insert_row = (
getattr(
joining_table_row,
self.m2m._meta.primary_foreign_key._meta.name,
),
getattr(
joining_table_row,
self.m2m._meta.secondary_foreign_key._meta.name,
),
)
# if new insert_row is not already in joining_table we can add it,
# otherwise skip adding to prevent duplicate rows in db
if insert_row not in unique_joining_table_rows:
joining_table_rows.append(joining_table_row)
else:
return
return await joining_table.insert(*joining_table_rows).run()
I also think the docs should change the way we add rows to use the get_or_create
method to prevent UNIQUE constraint
error. Something like this
>>> band = await Band.objects().get(Band.name == "Pythonistas")
>>> genre = await Genre.objects().get_ot_create(Genre.name == "Punk rock")
>>> await band.add_m2m(
... genre,
... m2m=Band.genres
... )
[{'id': 1}]
Thanks.
@sinisaos Thanks for looking into it. I haven't had a chance yet to fully wrap my head around it, but hope to look into it soon.
Can we solve this by adding composite unique constraints to Piccolo. So on the joining table, we make the foreign keys unique together.
@dantownsend That might work and I tested it by adding something like this
unique_a = f"CREATE UNIQUE INDEX unique_a ON track_to_playlist(tracks, playlists)"
await TrackToPlaylist.raw(unique_a)
unique_b = "CREATE UNIQUE INDEX unique_b ON track_to_album(tracks, albums)"
await TrackToAlbum.raw(unique_b)
but if you run the script a second time, the error UNIQUE constraint failed
appears.
Also if we want to add already existing tracks to the first playlist again like this
playlist = await Playlist.objects().get_or_create(Playlist.id == 1) # already exists
tracks6 = []
for i in range(5):
t = await Track.objects().get_or_create(Track.track_id == i)
tracks6.append(t) # already exists as tracks 0,1,2,3,4
await playlist.add_m2m(*tracks6, m2m=Playlist.tracks)
the same error appears.
I don't know if it is correct behavior, but this is not the case in Django (or the changes I made). We can run the script as many times as we want and if duplicates apears they are ignored and no error occurs. The biggest problem with my changes is that we scan the entire joining table looking for duplicates which could be a problem if the table gets large over time and it will affect performance. I hope I managed to explain what I mean.
unsaved = [i for i in rows if not i._exists_in_db]
This would still cause it to error on insert for manually created rows without calling get_or_create right ?
Because the _exist_in_db would always be false for rows created manually.
If creating rows manually is not supported then that's fine - for my use case playlists and albums will have up to 10,000 tracks - manually calling get_or_create for each of these would be awfully slow and inefficient I usually have the whole track data before I add it to the tracks table, so I manually create them I.e Track(id=...) then I add them to the playlists using add_m2m.
I was expecting an on conflict update for my use case so if the track metadata changes if the same track gets added to a new playlist/album it updates the existing tracks table
@Drapersniper First, thank you for your reply.
unsaved = [i for i in rows if not i._exists_in_db]
This would still cause it to error on insert for manually created rows without calling get_or_create right ?
Because the _exist_in_db would always be false for rows created manually.
Yes, that is correct. Same as Django or Tortoise orm where we first need to save the record to the db so we can add to m2m.
I was expecting an on conflict update for my use case so if the track metadata changes if the same track gets added to a new playlist/album it updates the existing tracks table
I'm sorry if I don't understand the problem well, but if we use conflict update (for Sqlite something like INSERT OR IGNORE INTO track(track_id) VALUES ('new_value');
) on the Track table, the number of rows in the join table (uses autoincrement serial primary key) will increase anyway, and already after step 2, the results will be wrong unless we somehow prevent the entry of duplicate records also in the join table. I'm not very familiar with Sqlalchemy and sorry if I'm wrong, but I think for your use case it would be the best solution because you can manually add records and nothing will be saved to the database until you call session.commit()
.
OK, we're closer to fixing this now we have the on_conflict
clause.
Sorry for the delay on this.
We probably also need composite unique constraints, for both foreign keys in the M2M table.
Discussed in https://github.com/piccolo-orm/piccolo/discussions/683
Currently it raises an exception