photoprism / photoprism

AI-Powered Photos App for the Decentralized Web 🌈💎✨
https://www.photoprism.app
Other
35.2k stars 1.93k forks source link

Infrastructure: Use PhotoPrism without cloud services #180

Open thielepaul opened 4 years ago

thielepaul commented 4 years ago

As a user I'd like to use PhotoPrism without cloud dependencies such that no private data is leaked.

Right now PhotoPrism depends on cloud services (openstreetmap.org) for displaying the map and for reverse geocoding. Like that, userdata such as photo coordinates and IP addresses is leaked.

Possible steps to resolve this issue are:

thielepaul commented 4 years ago

as hosting OSM data is not a viable option for most users, lightweight alternatives are of interest:

lastzero commented 4 years ago

I had a look at many potential solutions and found none of them feasible regarding required compute resources and data quality. If you only want country and region it gets easier and a local reverse lookup should be possible, but then you still want maps to display the location.

The experimental service we're currently providing does not log anything and I think it's already a major step forward not to put your photos on other people's server (cloud). Coordinates don't tell you who was there and when, so it's way less of an issue in practice. An important question is how much users are willing to pay (in hardware & admin time) to avoid that completely.

lastzero commented 4 years ago

Would be great if you can have a second look and share your findings with the community!

lastzero commented 4 years ago

Using a VPN / Proxy might also solve privacy concerns for those that don't enjoy managing a big stack of local services and keep them up-to-date / secure (edit: although you then need to trust your VPN provider... so it's not really a solution).

lastzero commented 4 years ago

One last (long) remark regarding this issue today:

We had a look at other home solutions for personal photo management like Monument and they use Google Maps, although they promise "complete privacy". What mapping solution we're going to ship in the end is unclear, I'm still doing technical evaluation. In general, OSM seems best for reverse lookups, much better than Google.

We stand ready to provide a service that at least doesn't store logs (there are some in memory for debugging during development, yes) and that also gives you a list of previous public events at the given location so that you can automatically create albums of music festivals etc. That is real value and you don't get that otherwise.

Adding a flag to disable reverse lookups is simple. I would still tag my photos though. These reverse lookups are really the last thing I'm worried about given the fact that I'm a Google customer and use Android as well as iOS with Google Maps and a ton of other apps (like the Huawei weather app!) that use my location.

To just display a world map and show where you have been, it's probably easiest to download a static GeoJSON map from https://geojson-maps.ash.ms/ and use Leaflet. They even provide example code to copy & paste.

lastzero commented 4 years ago

I've just added a config parameter for you:

        cli.StringFlag{
        Name:   "geocoding-api, g",
        Usage:  "geocoding api (none, osm or places)",
        Value:  "places",
        EnvVar: "PHOTOPRISM_GEOCODING_API",
    },

Note that it's largely untested as I've worked enough for today. Maybe you can test it and give feedback. Thank you!

thielepaul commented 4 years ago

Thank you for the detailed answer! Regarding self-hostable alternatives, I will hopefully have time in February.

The config parameter works in principle (tested with docker and Wireshark):

However, when I set it to none the import of a photo (with location metadata) is completing with a warning, but then it is not visible in the photos view. (photos without location metadata work fine) The same behavior is observed if the option is set to places, but requests to external servers are blocked by a firewall, so maybe this is a problem with error handling in the import process and thus a separate issue.

lastzero commented 4 years ago

@thielepaul Thank you for testing this so quickly! I'll take a look at error handling. There are some errors we can ignore when indexing, like this one. Others are fatal, like when we can't read the file at all.

lastzero commented 4 years ago

Fixed and refactored, you can test again 👍

thielepaul commented 4 years ago

Fixed and refactored, you can test again +1

:+1: As far as I tested it, this works as intended now. Thank you!

lastzero commented 4 years ago

Just launched the next version of our Places API, see Geocoding for details. It approximates locations using S2 cell IDs and only contacts external services if the internal database fails.

My impression is that this should be good enough in terms of performance and privacy for most users, but I'd love to hear your opinion as you seem to be especially concerned :)

thielepaul commented 4 years ago

Just launched the next version of our Places API, see Geocoding for details. It approximates locations using S2 cell IDs and only contacts external services if the internal database fails.

My impression is that this should be good enough in terms of performance and privacy for most users, but I'd love to hear your opinion as you seem to be especially concerned :)

I agree that this a good solution for most of the users, thank you!

Anyway, I would like to leave this issue open until I had time to look more into possibilities for offline reverse geocoding. Especially, I want to explore if the same methods that are used for finding the timezone based on the location (https://github.com/evansiroky/timezone-boundary-builder#lookup-libraries) can also be used to get country, state and bigger cities, as this level of accuracy would be sufficient for me (and maybe others).

lastzero commented 4 years ago

Sure, we also do this to learn :)

Dustin did something like that and our solution was inspired by it, see https://github.com/photoprism/photoprism/issues/21#issuecomment-568562593

Repositories:

lastzero commented 4 years ago

For tiles, you could use https://github.com/maptiler/tileserver-gl and download the tiles from https://openmaptiles.com/downloads/planet/ (~70 GB).

Note that those free tiles are from 2017, the latest version is $1024 for one year plus you can choose from several styles. So we've chosen to pay for hosting as that's much cheaper than paying for tiles and servers.

noaho commented 3 years ago

I also would love to use photoprism without exposing location data to a 3rd party.

I totally get that it is low risk, and for most people the alternative is Google/iCloud anyway, and I understand that you don't log, although that requires a lot of trust in you and your hosting provider.

But I can think of examples where the reverse geocoding could be a privacy leak, for example, over time you can build a list of frequent locations visited by an IP address, based on frequent locations that they take photos, and if the user has automatic sync setup you can also build a list of travel history and possibly make distinctions between home/work..

I see the downloading of map tiles as less of a risk, as you don't know where in the map tile the user is looking, and it's not automatic data that is going to build up a statistical pattern, but opportunistic caching and random downloading of adjacent tiles like Apple Maps does for privacy would help.

Would it be possible to support the OSM Nominatim API, then we could use our own self-hosted geocoder, or at least give us competition on which service we use?

https://geocoder.readthedocs.io/providers/OpenStreetMap.html

lastzero commented 3 years ago

Our backend doesn't know when pictures were taken and only works with S2 cell IDs, not the exact coordinates. Also results will be cached locally, so there are no additional requests when other photos were taken at a similar position. Should be safe enough for 99% of users, especially when you're using any other geodata-enabled site / app like Google Maps, Facebook or even Twitter. Much easier to create a profile there than trying to gain information from our privacy-friendly API.

When it comes to supporting other APIs, we found that their metadata models & categorization varies. So it needs additional work. We don't have the resources to do this right now, or even document all the differences in detail so that users can make an informed choice and don't just file bug reports because it doesn't work as expected.

If there would be an easy way for self-hosting, we would have done this. Once we reach our funding goal and have more resources, we will continue to improve and add more options.

Note that there are offline maps already, of course with less detail. We don't own the satellite maps, so we can't give them away to host them locally - even if you had enough storage.

To disable geolocation features completely, set PHOTOPRISM_DISABLE_PLACES to "true".

noaho commented 3 years ago

Fair enough. I understand that it's not a priority, and will try to work around this my making my photoprism instance access the APIs through a VPN. Thanks for responding, the differences in Metadata is something that I'll think about. It's probably out of scope for photo prism and I wonder how other teams are working around it (if at all) for example the microg location providers.

G2G2G2G commented 3 years ago

@noaho

over time you can build a list of frequent locations visited by an IP address, based on frequent locations that they take photos, and if the user has automatic sync setup you can also build a list of travel history and possibly make distinctions between home/work.. APIs through a VPN..

With that method you can still be targeted via the same methods you listed above. Very easy to do so if you're just watching one person.

@lastzero Thanks for the privacy advances this last year. On the back burner I think grabbing the openmaptiler files automatically and setting it up automatically is a good idea if you'd want like a "I want full privacy" check box. It's far more than 70GB you posted though. More like 200gb for what most would call basic and 300+ for everything lol