Closed henrymori closed 3 years ago
@reichert621 ^^ As discussed, some thoughts on geocoding implementations 👍
Consolidated discussion from the Slack thread (https://papercups-io.slack.com/archives/C0189MJHKMJ/p1599057225003600?thread_ts=1599005083.000900&cid=C0189MJHKMJ):
Closing this issue. Using browser information is good enough
Following up on https://github.com/papercups-io/papercups/issues/57 which fixes capturing the end-customer's IP address correctly:
Now that the
customer
's IP address is being captured correctly, it's possible to do some geocoding to find out the country & city of the customer and store that data on thecustomer
record.A few ideas I wanted to kick around regarding implementation:
Option 1 (self-host free MaxMind data + ETS or Redis Cache):
geolix
andgeolix_adapter_mmdb2
.When someone is self hosting, they would then be required to register for their own MaxMind account and obtain an API key to obtain a copy of the database: https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/
The database is updated on Tuesday each week, which means that the version that the developer downloads today and starts using will be outdated the following Tuesday when a new version of the database is released. Whether data "freshness" matters much for the purposes here, I cannot say for certain. I would assume that the database doesn't change that frequently -- perhaps especially so for IPv4 addresses, maybe less so for IPv6 which are still coming online.
It should be fairly trivial to set up an Oban job to grab the latest version of the database each week, but this first implementation pass should probably be more narrowly scoped/naive and assume that the Maxmind database downloaded and loaded into the add by the developer on day 1 is already present on app boot up.
geolix
appears to take the Maxmind DB file and load it into anets
table, so lookups will be fast once the app is booted up. The potential issue with this: the DB is 125MB+ zipped before loading that data into anets
table (in memory cache). I'm not certain how much memory will be taken up once the full dataset is loaded into memory locally for each pod/node of Elixir running, but I'd imagine that the RAM usage per pod may increase by 50-150MB+. It's possible that the.mmb
file format is less efficient thanets
storage, so the on-disk space may not represent the in-memory space, but I'd have to dig in further and discuss this here before spending more time on this. Startup time for the pod/nodes will also be increased if every node has to build a local cachedets
table of the data -- by how much, I couldn't say for certain until this was prototyped & tested.Another option would be to introduce a centralized cache option i.e. Redis. This way, the Elixir pods don't have to store the data locally and there's a central 'source of truth' for the lookup data. The downsides are: (a) another moving piece of infrastructure to set up and maintain (b) cost: Redis is expensive to run (c) less performance than ETS (the Redis instance is typically running on a separate server/managed service, necessitating a network call).
Option 2 (call an external service):
Look into implemeting something like https://github.com/navinpeiris/geoip where an external service is called (providing an API key) and then layer in some caching. The caching wouldn't necesarily be beneficial (if at all) in a multi-node setup unless it's distributed via
libcluster
i.e. a customer could hit node A on the first request, the result is cached locally on node A but then their second request is served by node B which doesn't have that data locally cached.ets
lookups in option 1 are going to be at least x10 faster in all likelihood than making an external HTTP request (~10ms vs ~100ms+).external services charge "per lookup" with limits on cache duration in their ToCs (if they permit caching at all in their policies -- this seems to vary by service). Option 2 could end up being a lot more costly and significantly less performant (unless the determination and settlement of the country/city lookup data is made eventually consistent by kicking off an async process to with a
Task
or simpleGenServer
that deals with the business logic of doing the lookup and saving the customer's country/city off of the request path so failure/latency concerns are less of an issue).Other considerations:
Whichever approach is chosen, should we make geocoding optional through a runtime environment flag? i.e. optional for those that want geocoding but not mandatory for the app to run?
In the event that a customer changes locations/IPs, currently the IP address appears to be saved when the record is initially created and not updated when/if a customer changes IP address (correct me if I'm wrong)? How important is it to maintain or change this behavior? A customer moving between timezones/traveling should probably have their accurate IP address stored in the DB at any point in time?
Does the
ip_inferred_country
(or whatever the property is called) need to be saved in the database or can it be an Ecto virtual field that only the frontend cares about and as such is generated dynamically on-the-fly? One benefit of storing the field in the DB and making a call toets
or an external service means that if Option 2 is viable, a call would only have to be made when (1) the customer is initially created and (2) in subsequent calls only if theip_address
stored on the customer record doesn't match that requests' current ip address.Keen to hear your thoughts!