pelias / api

HTTP API for Pelias Geocoder
http://pelias.io
MIT License
220 stars 163 forks source link

autocomplete balance #328

Closed missinglink closed 8 years ago

missinglink commented 9 years ago

this is a good example for tweaking the balance between linguistics vs. geography vs. exact matches for focus.point:

/v1/autocomplete?focus.point.lat=40.74686681162143&focus.point.lon=-73.98983001708986&text=katzs deli

the query is for the famous http://katzsdelicatessen.com/ and the focus is on NYC

for this query /autocomplete returns the following for "katzs deli":

 1) Katz Deli, Monroe County, TN
 2) Katz Deli, Montréal, Quebec
 3) Katz's Deli Express, Shenandoah, TX
 4) Katz's New York Deli, Houston, TX
 5) Kaipara District, Northland Region
 6) Kota Depok, Jawa Barat
 7) Demir Kapija, Macedonia
 8) Kawerau District, Bay of Plenty Region
 9) Kaikoura District, Canterbury Region
10) Derby-West Kimberley, Western Australia

.. but when fully specifying the name as "katzs delicatessen" then we get the correct item first:

 1) Katz's Delicatessen, Manhattan, NY
 2) Katzinger's Delicatessen, Columbus, OH
 3) Katzinger's Delicatessen, Columbus, OH
 4) Kairós Delicatessen, FEIRA DE SANTANA, BAHIA
 5) Karl's Delicatessen, Centennial, CO
 6) Kairós Delicatessen, FEIRA DE SANTANA, BAHIA
 7) Kairós Delicatessen, FEIRA DE SANTANA, BAHIA
 8) Delicatessen, Manhattan, NY
 9) A 1 Delicatessen, Jersey City, NJ
10) De Groot Kaas & Delicatessen, Binnenmaas, Zuid-Holland

it would be a good idea to play with this linquistics/geography balance and the boost applied to exact matches in order to try and get this better, at the same time trying to avoid break other behaviour...

the admin boosting is also playing a part here, maybe we can discuss in a project meeting because there are lots of different analysis playing a part in what gets returned here.

dianashk commented 9 years ago

Additionally there are issues with addresses in autocomplete:

@randymeech reported:

I started the in eastern Long Island (on the move), searched for my address. Still took too long to find with many, many far-flung results in autocomplete given that I was 100 miles away from Brooklyn.

@souperneon also reported:

I have the same problem with search every time. The address is there and I can find it if I am within (I'm guestimating) 10-15miles of it. But I can't find it even if I type in the full exact address if I am further than that.

souperneon commented 9 years ago

Thanks for adding this one in @dianashk I'm not sure if this is related, but autocomplete stops autocompleting after a point and just shows results from Africa?

riordan commented 9 years ago

@souperneon Could you send an example?

riordan commented 9 years ago

EXAMPLE: 55 Stratford Av Greenlawn, New York (Focus point from Queens, NY)

55 Stratford Av

1) 55 Stratford Avenue, Huntington, NY :white_check_mark: 2) 5 Stratford Road, Brooklyn, NY 3) 5 Stratford Road, Brooklyn, NY 4) 2 Stratford Road, Brooklyn, NY 5) 2 Stratford Road, Brooklyn, NY 6) 7 Stratford Avenue, Staten Island, NY 7) 7 Stratford Avenue, Staten Island, NY 8) 6 Stratford Avenue, Staten Island, NY 9) 6 Stratford Avenue, Staten Island, NY 10) 4 Stratford Avenue, Staten Island, NY

 {
      "type": "Feature",
      "properties": {
        "id": "de032ef29a8841beaa6cbf0a82a95cc1",
        "gid": "oa:address:de032ef29a8841beaa6cbf0a82a95cc1",
        "layer": "address",
        "source": "oa",
        "name": "55 Stratford Avenue",
        "housenumber": "55",
        "street": "Stratford Avenue",
        "postalcode": "11740",
        "country_a": "USA",
        "country": "United States",
        "region": "New York",
        "region_a": "NY",
        "county": "Suffolk County",
        "localadmin": "Huntington",
        "locality": "Greenlawn",
        "neighbourhood": "Little Plains",
        "confidence": 0.882,
        "distance": 49.695,
        "label": "55 Stratford Avenue, Huntington, NY"
      },
      "geometry": {
        "type": "Point",
        "coordinates": [
          -73.34405,
          40.868223
        ]
      }
    },

55 Stratford Av Greenlawn

1) 55 Greenlawn Terrace, Babylon, NY 2) Greenlawn Station, Huntington, NY 3) 5 Stratford Road, Brooklyn, NY 4) 5 Stratford Road, Brooklyn, NY 5) 2 Stratford Road, Brooklyn, NY 6) 2 Stratford Road, Brooklyn, NY 7) 7 Stratford Avenue, Staten Island, NY 8) 7 Stratford Avenue, Staten Island, NY 9) 6 Stratford Avenue, Staten Island, NY 10) 6 Stratford Avenue, Staten Island, NY

55 Stratford Av Greenlawn NY

1) 5 Stratford Road, Brooklyn, NY 2) 5 Stratford Road, Brooklyn, NY 3) 2 Stratford Road, Brooklyn, NY 4) 2 Stratford Road, Brooklyn, NY 5) 55 Greenlawn Terrace, Babylon, NY 6) Greenlawn Station, Huntington, NY 7) 7 Stratford Avenue, Staten Island, NY 8) 7 Stratford Avenue, Staten Island, NY 9) 6 Stratford Avenue, Staten Island, NY 10) 6 Stratford Avenue, Staten Island, NY

55 Stratford Av Greenlawn New York

1) New York County, NY 2) Greater New York Academy, Queens, NY 3) Greater New York Academy, Queens, NY 4) Gracie Station New York Post Office, Manhattan, NY 5) Gracie Station New York Post Office, Manhattan, NY 6) New York Structural Biology Center, Manhattan, NY 7) New York Structural Biology Center, Manhattan, NY 8) Hamilton Grange Station New York Post Office, Manhattan, NY 9) Hamilton Grange Station New York Post Office, Manhattan, NY 10) Grand Hyatt New York, Manhattan, NY

Link

Appears to be a few things in play:

  1. Autocomplete not taking locality into account (though to be honest, it's not taking the localadmin, huntington into account either)
  2. We don't have state abbreviations [#329]
  3. We're bad at keeping matching matches in the list

Search is successfully completing this query.

riordan commented 9 years ago

pelias/pelias#45 has some ideas of how to handle this, but generally we need a strong definition of what the autocomplete balance should look like.

Closing pelias/pelias#45 but look there for additional notes.

riordan commented 9 years ago

Going to take a stab at defining what we're looking for here, before moving it up and into the wiki:

There are a few behaviors we're looking to model:

  1. The "General prioritization" order by which things should show up in an autocomplete search
  2. The "Stickiness" of places once people see the result they've intended in the list, but continue to type.

We can't talk about one without the other. And it's super important to talk about this in the context of the deduplication work [#339], which should limit cruft in the results.

General Prioritization

TKTK

This is something we'll continue to argue about and should come from a further conversation w/ our users. It's also challenging to define given focus bias points.

Generally, we should be aware of some of the content of the queries as they're coming in, allowing the detection of a leading numeric (likely then an address) or letters (likely then a locality, a POI, a region, or a street).

Stickiness of results

The Problem

We know that users will often continue to type their query, even if it's in the result list already. We all do this. Perhaps it's to drive the matching place higher in the results, or because the user hasn't realized that their place is there (or haven't yet processed it visually), or because they're conditioned to keep typing and then hit "Search". They keep typing. Perhaps this is part of why FST's work well for autocomplete (when they can be used).

Examples:

Expected Behavior

When a user begins typing, their intended place should (eventually) show up in the top 5 places of the result list. Once it does show up, as the user specifies further, the place should continue to match, and its placement should increase, eventually bringing it toward the top of the results (unless there are other, equally scoring places).

QUESTION: How should focus.point affect this? At what point should a linguistic match overwhelm closeness? And how should distance from the point affect ranking?

Complicating Factors

Our expansion / compression logic will street -> st, meaning str -> str, stre -> stre and the additional scoring boost seen by the user before won't be expressed.

missinglink commented 9 years ago

for the sake of brevity, if there are more issues regarding autocomplete (i'm sure there are) could you please file a seperate ticket and add the orange autocomplete label: https://github.com/pelias/api/labels/autocomplete

thanks! this will make it easier to read everyone's feedback, categorise & prioritise better, and start fixing it!

missinglink commented 8 years ago

Update on this issue:

/v1/autocomplete?focus.point.lat=40.74686681162143&focus.point.lon=-73.98983001708986&text=katzs deli

1)  Katz's Delicatessen, Manhattan, NY, USA
2)  Katz Deli, Fort Loudon, TN, USA
3)  Katz Deli, Montréal, Canada
4)  Katz's Deli Express, Shenandoah, TX, USA
5)  Katz's New York Deli, Houston, TX, USA

^ this seems to be a massive improvement

leaving this open for now as I'm pretty confident all these issues are already being addressed in other tickets, I will use these cases for the regression test suite once the work is complete.

missinglink commented 8 years ago

Moving ticket to 'in review'

missinglink commented 8 years ago

all the issues noted above have been resolved in the production environment, at time of writing:

/v1/autocomplete?focus.point.lat=40.74686681162143&focus.point.lon=-73.98983001708986&text=katzs deli

1)  Katz's Delicatessen, Manhattan, New York, NY, USA
2)  Katz Deli, TN, USA
3)  Katz Deli, Montréal, Quebec, Canada
4)  Katz's Deli Express, Shenandoah, TX, USA
5)  Katz's New York Deli, Houston, TX, USA
/v1/autocomplete?focus.point.lat=40.769073&focus.point.lon=-73.918458&text=55 stratford av greenlawn new york

1)  55 Stratford Avenue, Greenlawn, NY, USA
2)  55 Stratford Avenue, Pittsfield, MA, USA

the "katz diner" issue is covered by an existing acceptance test and the "stratford av" functionality is covered in other existing tests in autocomplete_streets.json

souperneon commented 8 years ago

@missinglink \o/