immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
51.45k stars 2.72k forks source link

Reverse geocoding is frequently incorrect #8941

Closed hermesespinola closed 6 months ago

hermesespinola commented 6 months ago

The bug

I have a bunch of photos from one place that get incorrect location tags. This makes it difficult to correctly search for photos from one particular city. Most of the times it's neighborhood names that get classified as cities, cities as states.

Here are some examples: image

image

The OS that Immich Server is running on

Linux raspberrypi 6.1.0-rpi7-rpi-2712

Version of Immich Server

v1.101.0

Version of Immich Mobile App

v1.101.0 build.147

Platform with the issue

Your docker-compose.yml content

Not relevant?

Your .env content

Not relevant?

Reproduction steps

1. Open up place search.
2. Search for a place that is not a city inside "City" or open a photo and see the location tag.

Relevant log output

No response

Additional information

No response

mertalev commented 6 months ago

This is not so much an incorrect classification as it is the fact that we're trying to squeeze four fields (name, admin1Name, admin2Name, country) into three (city, state, country). The "name" part is inconsistent: sometimes it's a city, sometimes it's a neighborhood. admin2Name is also similarly sometimes a county and sometimes a city.

Concatenating admin1Name and admin2Name as the "state" is what we've gone with for now, but it isn't perfect, as you've noticed. Maybe concatenating the name and admin2Name fields would make more sense since they're more "city-like"; admin1Name is generally consistent in being a state/province.

mertalev commented 6 months ago

To give an example, admin1Name is New York (the state), admin2Name is Kings County, and then there are a bunch of neighborhoods like Brooklyn Heights for the name. So let's say we change it to show admin2Name, admin1Name and the country to get Kings County, New York, US. Now if you search for the city "Marseille", you'll get no results because it's listed as a name and admin2Name in this case is a department.

hermesespinola commented 6 months ago

I wasn't sure how Immich is doing the reverse geocoding (and I'm still not 100% sure), but I see you have downloaded the txt files from https://download.geonames.org/export/dump/readme.txt. Is name coming from the file cities500.txt?

First off, I only have context on how this work in the US and Mexico, so I don't know if this is how other countries geographical areas are organized.

Reading the geoNames docs, it seems like admin1Name is considered the first level division of a country, so, at least in the US and Mexico, that's always a state:

grep "US\..*" admin1CodesASCII.txt 
US.AR   Arkansas    Arkansas    4099753
...
US.VA   Virginia    Virginia    6254928

As for admin2, again, in Mexico and the US, second level divisions are counties or municipalities which are not cities, while they are sometimes interchangeable and/or have the same name, using NY as an example:

grep "US\.NY\..*" admin2Codes.txt 
US.NY.001   Albany County   Albany County   5106841
...
US.NY.047   Kings County    Kings County    6941775

What's more, all end in "County". So this doesn't seem useful for Immich, at least not within the 3 location subdivisions that Immich uses.

In cities500.txt it seems there are any other sub-divisions, including neighborhoods, residential areas, etc. In my examples "Brooklyn Heights" has the P tag "PPLX" which according to geonames docs it's a "Section of a populated place". In the case of NYC, NYC has the PPL tag, and each of the boroughs has the PPLA2 tag. (i.e.: Brooklyn, Bronx, etc.).

As for the Mexico example, "Paseos del Valle [Fraccionamiento]" is listed with "PPL". Then "Tlajomulco de Zuñiga" (which immic got from admin2 according to what you described) is the municipality, not the city, which would be again wrong, if it had a different name to the city. There's a "Tlajomulco de Zuñiga" entry in countries500.txt with the tag "PPLA2" which refers to the actual city :)

hermesespinola commented 6 months ago

TL;DR; For "City" it seems to me it'd be good to always exclude PPLX places, as they seem to refer to not cities, prefer to use any tags more specific than PPL, and fallback to PPL otherwise. For State admin1CodesASCII.txt looks to be perfect. But I wonder if this would be a good generalization for other countries as well. I might be interested in contributing to improve this functionality if possible

mertalev commented 6 months ago

That's a great idea! We don't even ingest these tags into the database, much less use them. For reference, we ingest the geodata here and do the reverse geocoding here.

bo0tzz commented 6 months ago

cc @zackpollard, you had ideas about improving the geocoder detail level right?

zackpollard commented 6 months ago

cc @zackpollard, you had ideas about improving the geocoder detail level right?

Yea, we do have a lot more detailed data available to us, however it needs a lot of sanitisation to be usable for what we want. It should offer some quite significant improvements to accuracy, especially when in built up areas though.

zackpollard commented 6 months ago

As for the comments above, yea I have also noticed this behaviour previously, if there is a quick improvement we can make here with that date then I think it would be good to do it, but also yes let's check a few other countries to see if it's the same for them too.