Closed missinglink closed 8 years ago
http://pelias.mapzen.com/doc?id=osmnode:2235377518
has opening quote for it's outlandish claim, no closing quote.
"text": "Nobis \"the best pizza, Monte di Procida, NAPOLI"
had a think about this, we could easily add a transform.trim()
function to all the string getter/setter
functions but my feeling is that the importers should take responsibility for performing string operations on their data, it's not really the responsibility of this repo to do so; although we do in some cases lowercase
or uppercase
for consistency (eg. for alpha3).
if we did decide to do it here, would it only be whitespace? trimming additional chars like "
sounds dangerous, as per the example above: foo "bar"
would become foo "bar
.
thoughts?
Don't the ES analyzers we employ trim the tokens?
@trescube yes, but then the documents returned by ES would still have whitespace in them.
The openaddresses importer will, as of pelias/openaddresses#31, have whitespace trimming fully taken care of. This is mostly so that it can guarantee all records that are inserted into Elasticsearch are valid, and don't have fields that consist entirely of whitespace. So it seems like whitespace trimming is important for importers to do. That said, just trimming whitespace is pretty easy, so doing it everwhere isn't the worst. I definitely don't think we should trim any other characters.
this issue is old and seems to be a low priority.
as discussed above, it would be best to fix data in the source importer rather than in the model itself.
migrating issue https://github.com/pelias/suggester-pipeline/issues/5 to this repo