Open karussell opened 10 years ago
Yohan told me this is something we should try out to see which option has the best tradeoff between performance and storage size.
Yes, sure. Maybe there is even a better, less hacky way of doing this. E.g. like the cross_fields approach and still using nedge gram where it would just boost berlin erlanger*
more than berlin* erlangen
somehow.
These docs seems to be more current + better example
btw. the cross_fields approach might make the collector field obsolete. we introduced it to have equal idf for all fields. But I haven't been aware of this feature so far...
@yohanboniface , we could even give different scores to each field, not only distinguish between name and collector. And much more important, we aren't forced to copy each time the default fields into the language specific collectors. this opens the door for multilingual support of all languages in osm as we save a lot of storage size...
Yes, kind of recent feature but we'll have to try if this solves our problem.
cross_fields
can't work with fuzzy atm.
btw, wordending
is not the hotest topic if you have time to spend on search logic. Two things we are on:
On the search logic part, the more up to date branch is https://github.com/komoot/photon/tree/positivescoring
Also: add tests! :)
Probably we also need a mailing list. Should I create a google group or one at openstreetmap?
Re tests: do you mean creating Java test suite (master) or adding others? I could go to create Java test stuff
Probably we also need a mailing list. Should I create a google group or one at openstreetmap?
I'd go for geocoding@openstreetmap.org, to keep the argument open instead of having a mailing dedicated for photon, then one for pelias, etc.
Re tests: do you mean creating Java test suite (master) or adding others? I could go to create Java test stuff
I was referring to search tests, like those, but all tests are good ;) BTW, Christoph already started on the Java side I think.
Hmmh, 'geocoding' mainly sends issues. I would prefer a list dedicated to discussion where nominatim and photon would be okay but there are similar projects like e.g. GraphHopper and OSRM which have separate lists ;)
I was referring to search tests, like those, but all tests are good ;) BTW, Christoph already started on the Java side I think.
Ok, we still need some more lightweight test cases in Java I think. I've create a PR for that. See e.g. this
I like the idea of a mailing list and would go for a photon specific mailing list as geocoding is super generic. Do you know who we can approach for setting up a new osm mailing list?
Great commit, peter!
@christophlingg I'll give you the mail via mail ;)
Using wordending is kind of a workaround for nedgegram searches like
berlin erlange
which would match
berlinerstraße erlangen
but better should only match stuff like 'berlin erlange*'.When this workaround is used - why not avoid edge ngram at all and tokenize the query, plus do a prefix query for the last term? This would save space and memory with same quality. The only problem could be performance but my simple tests for small data don't tell me problems there.