Open missinglink opened 5 years ago
also worth adding that the existing workflow is error-prone, in the case where a record contains multiple names in multiple languages, if one of those names is a URL then an Error is thrown and the whole record is discarded.
there are solutions for this which importers can elect to implement (such as the per-setter try/catch 💩 I mentioned above) although to-date none of the importers have adopted this approach and it's unlikely custom importer authors will have the insight to do so.
Cool, it will be good to finally have this fixed. I have two suggestions:
1.) Definitely do use pelias-logger
. It's super helpful to have all the logging formats controlled in a single place, and to keep the log output consistent.
2.) Lets try to think of a warning message that reassures users what is going on. We have learned that even the most "gentle" messages that are explicitly logged as warnings will still lead to some users thinking they are doing something wrong or that the entire import is invalid. invalid regex test, ${val} should not match ${regex}
is pretty terse and absolute-sounding. How about something along the lines of "skipping record #{record_gid} because its name contains the url ${val}"? Having the GID in there would also be very helpful as we could then parse the logs and create a list of all records that did not pass the validation.
Yeah that sounds good, although there is an issue with the chain-of-command now, since the error is not being passed back to the caller we can't say things like "skipping record" as this library has no control over that behaviour ☹️
Oh, hm. So what will the behavior be with this PR for a record that has a name that contains a URL?
If the record has a name with a URL it will console.warn
and not throw an Error
.
Prior to this PR it would throw an Error
and it would be up to the importer to handle that, most would just reject the whole record.
Got it, and with this PR, what document would end up being stored in Elasticsearch? Would it have a property containing the URL?
Hmm yes, I just had the same thought, and the tests don't properly cover checking that.
I think we need to have a rethink about how the validation works in general.
It makes sense to throw on mandatory properties in the constructor but in reality there are also additional properties which are mandatory such as a name.default
and center_point
.
One option is that we stop throwing on all setter methods (and ensure that the property is not set) along with an appropriate log message and log level.
I think this would be simpler for importers to handle due to less catch
statements. The 'chaining' API complicated things as setters can't return values.
In order to ensure that the minimum viable set of properties is set on the object we can introduce a new method called isValid()
or isValidOrThrow()
which gets called just before the document is pushed downstream.
this is one solution to the verbose regex matching errors.
we've had a lot of users worried and confused about the verbose errors we return when the source data for a name or address erroneously contains a URL.
there are a few ways of dealing with it, I considered returning a
PeliasModelWarning
error but solutions like that require that each importer is updated and it's not nice code because throwing will terminate the current scope, necessitating that each setter is wrapped in its owncatch
block 💩this solution is fairly clean, I'm still outputting the error message via
console.warn
and considered usingpelias-logger
although I'd prefer to reduce the coupling to other modules to this one to avoid the 'bump cascades'.