pelias / model

Pelias data models
6 stars 17 forks source link

[do not merge] do not throw on name/address regex matching errors #122

Open missinglink opened 5 years ago

missinglink commented 5 years ago

this is one solution to the verbose regex matching errors.

we've had a lot of users worried and confused about the verbose errors we return when the source data for a name or address erroneously contains a URL.

there are a few ways of dealing with it, I considered returning a PeliasModelWarning error but solutions like that require that each importer is updated and it's not nice code because throwing will terminate the current scope, necessitating that each setter is wrapped in its own catch block 💩

this solution is fairly clean, I'm still outputting the error message via console.warn and considered using pelias-logger although I'd prefer to reduce the coupling to other modules to this one to avoid the 'bump cascades'.

missinglink commented 5 years ago

also worth adding that the existing workflow is error-prone, in the case where a record contains multiple names in multiple languages, if one of those names is a URL then an Error is thrown and the whole record is discarded.

there are solutions for this which importers can elect to implement (such as the per-setter try/catch 💩 I mentioned above) although to-date none of the importers have adopted this approach and it's unlikely custom importer authors will have the insight to do so.

orangejulius commented 5 years ago

Cool, it will be good to finally have this fixed. I have two suggestions:

1.) Definitely do use pelias-logger. It's super helpful to have all the logging formats controlled in a single place, and to keep the log output consistent. 2.) Lets try to think of a warning message that reassures users what is going on. We have learned that even the most "gentle" messages that are explicitly logged as warnings will still lead to some users thinking they are doing something wrong or that the entire import is invalid. invalid regex test, ${val} should not match ${regex} is pretty terse and absolute-sounding. How about something along the lines of "skipping record #{record_gid} because its name contains the url ${val}"? Having the GID in there would also be very helpful as we could then parse the logs and create a list of all records that did not pass the validation.

missinglink commented 5 years ago

Yeah that sounds good, although there is an issue with the chain-of-command now, since the error is not being passed back to the caller we can't say things like "skipping record" as this library has no control over that behaviour ☹️

orangejulius commented 5 years ago

Oh, hm. So what will the behavior be with this PR for a record that has a name that contains a URL?

missinglink commented 5 years ago

If the record has a name with a URL it will console.warn and not throw an Error. Prior to this PR it would throw an Error and it would be up to the importer to handle that, most would just reject the whole record.

orangejulius commented 5 years ago

Got it, and with this PR, what document would end up being stored in Elasticsearch? Would it have a property containing the URL?

missinglink commented 5 years ago

Hmm yes, I just had the same thought, and the tests don't properly cover checking that.

I think we need to have a rethink about how the validation works in general.

It makes sense to throw on mandatory properties in the constructor but in reality there are also additional properties which are mandatory such as a name.default and center_point.

One option is that we stop throwing on all setter methods (and ensure that the property is not set) along with an appropriate log message and log level.

I think this would be simpler for importers to handle due to less catch statements. The 'chaining' API complicated things as setters can't return values.

In order to ensure that the minimum viable set of properties is set on the object we can introduce a new method called isValid() or isValidOrThrow() which gets called just before the document is pushed downstream.

missinglink commented 4 years ago

Closes https://github.com/pelias/openstreetmap/issues/534

missinglink commented 4 years ago

related https://github.com/pelias/model/issues/116 related https://github.com/pelias/polylines/issues/216 related https://github.com/pelias/polylines/pull/225