mmorey / MDMHPCoreData

Source code for High Performance Core Data talk
http://highperformancecoredata.com
Other
65 stars 8 forks source link

Duplicate insert #1

Open HarrisonJackson opened 9 years ago

HarrisonJackson commented 9 years ago

I am getting duplicate inserts when I have a data set like this:

{ "guid":"a"},
{ "guid":"c"},

Followed by an import on data set like this:

{"guid":"a"},
{"guid":"b"},
{"guid":"c"},

After the first data set is inserted (just a and c) we try to import a new json set with a,b,c:

try to fetch objects a,b,c from managedObjectContext
return objects a,c

compare a==a
update a

compare b==c
insert b

compare c==b
insert c

object c is then duplicated

HarrisonJackson commented 9 years ago

Updated with actual ufo json:

First import this in ufo.json:

[
    {"description": "Man repts. witnessing "flash, followed by a classic UFO, w/ a tailfin at back." Red color on top half of tailfin. Became triangular.", "reported_at": "19951009", "shape": "", "location": "Iowa City, IA", "duration": "", "sighted_at": "19951009", "guid": "a"},
    {"description": "Telephoned Report:CA woman visiting daughter witness discs and triangular ships over Squaxin Island in Puget Sound. Dramatic.  Written report, with illustrations, submitted to NUFORC.", "reported_at": "19950103", "shape": "", "location": "Shelton, WA", "duration": "", "sighted_at": "19950101", "guid": "c"},
]

Stop the app then update ufo.json to this and import:

[
    {"description": "Man repts. witnessing "flash, followed by a classic UFO, w/ a tailfin at back." Red color on top half of tailfin. Became triangular.", "reported_at": "19951009", "shape": "", "location": "Iowa City, IA", "duration": "", "sighted_at": "19951009", "guid": "a"},
    {"description": "Man  on Hwy 43 SW of Milwaukee sees large, bright blue light streak by his car, descend, turn, cross road ahead, strobe. Bizarre!", "reported_at": "19951011", "shape": "", "location": "Milwaukee, WI", "duration": "2 min.", "sighted_at": "19951010", "guid": "b"},
    {"description": "Telephoned Report:CA woman visiting daughter witness discs and triangular ships over Squaxin Island in Puget Sound. Dramatic.  Written report, with illustrations, submitted to NUFORC.", "reported_at": "19950103", "shape": "", "location": "Shelton, WA", "duration": "", "sighted_at": "19950101", "guid": "c"},
]
mmorey commented 9 years ago

@HarrisonJackson I don't understand the problem. Are you saying the import code is broke as it is allowing duplicates? Pull requests are welcome.

HarrisonJackson commented 9 years ago

Yes, that's what I am saying. It seems as if the intention of the find or create method is to find an existing record, so that a duplicate is not inserted, but there are cases the way it is written where a duplicate could be created.

I was actually reporting the issue because I hoped you had an idea of how to fix it haha. The fastest and definitely dirtiest fix is to change the MDM_BATCH_SIZE_IMPORT from 5000 to 1. Obviously that wastes many of the other optimizations. After fiddling with it a bit, I found it takes an import that took ~2 seconds to about ~5 seconds. Not ideal, but in my case it was better than allowing duplicates. If I find a cleaner way I will submit a PR.

The project and associated write up are excellent - thanks for putting them out there.