pelias / model

Pelias data models
6 stars 17 forks source link

remove phrase field in Document model #148

Closed missinglink closed 2 years ago

missinglink commented 2 years ago

remove phrase field in Document in favour of duplicating name during serialisation.

Screenshot 2022-04-19 at 14 11 26

historically we've had two separate fields named name and phrase which correspond to fields containing prefix grams and those which don't, respectively.

the two fields contain exactly the same strings, and I've tried over the years to figure out a clean way of combining them into a single field, a trivial task but one which seems impossible to do without introducing breaking changes 😿

this PR makes a change to the Document model such that there aren't two copies of all the strings being passed around in memory and operated on.

a quick look over the diff in this PR shows that we were doing a bunch of CPU work which was duplicatory and unnecessary, plus we're using up more memory storing all the strings twice 🤷‍♂️ .

@orangejulius could you please sanity check this for me?

orangejulius commented 2 years ago

Cool, I think this looks good. I have a vague memory that we have occasionally not had the same values in the name and phrase fields, though that might have been unintentional and something we eventually fixed. Do you remember?

In any case, this looks like a good step forward, and hopefully someday we can actually finish off https://github.com/pelias/schema/issues/285

missinglink commented 2 years ago

I also had a vague memory of discussing it, but it would have been 5+ years back, I had a quick look through the blame and found comments mentioning moving from quattroshapes to whosonfirst, so that's how long ago it was 😆

I don't recall the two fields ever being different, maybe in the schema analysis, but not in the model.

ref: https://github.com/pelias/model/commit/4cc36768a4222b935e05cf0a5dd244892265b640

orangejulius commented 2 years ago

I think I was thinking of https://github.com/pelias/model/pull/132, which was indeed a bug. So we should be all good.