tilezen / tilequeue

Queue operations to manage the processes surrounding tile rendering.
MIT License
47 stars 19 forks source link

Commands related questions #198

Closed ambientlight closed 7 years ago

ambientlight commented 7 years ago

I am going through the tilequeue/command.py to understand the usage of TOI related things.

In tilequeue intersect based on #186 I initially though you can feed the output of osm2psql, but when I looked at it, it actually takes the expired_tiles_location folder and derives Tile IDs from it for intersection.

  1. How do you actually move tiles to this expired_tiles_location? Cause I though it definitely makes sense to move out the tiles away as fast as possible once they got expired as a result of update, so that tileserver will recompute the tiles and users will see the updates faster instead of waiting for a quite long queued pre-computation. But since you are using s3 buckets, it doesn't feel like you are going to move these tiles away to EC2 instance running tilequeue. Do you also use a local file store for all tiles at the same e2 instance you are running tilequeue? in this case it would imagine some shell script parsing TileID's from osm2psql --expired_tiles step and moving these to expired_tiles_location
  2. tilequeue enqueue seems like queing up all TOI set for regeneration. When do we actually need to regenerate the full TOI sets?

I am working on few tweaks to make the whole tilequeue update-related tasks work out of the box in the tilequeue/tileserver. Just wanted to get some of your thoughts about these changes:

  1. Disabling the metatiles - setting both to 0 or null will result in exception in different spots. Semantically null seems to be more preferred to 0, in which case default_queue_buffer_size will fail on arithmetics with NoneType. In this case is 128 is an intended default size for non-metatile queue? default_queue_buffer_size = max(1, 128 >> (2 * (cfg.metatile_size or 0)))

  2. Adding the ability to feed the osm2psql --expired_tiles file directly to tilequeue intersect

  3. Removing the database-uri from toi-prune: redshift: instead will connect to the database specified in postgresql:

  4. Having tileserver dumping the tile-traffic to postgres' tile_traffic_v4 for toi-prune purpose, which can be enabled from tileserver config. (since you seems to be using something from outside of tilezen stack to actually dump the tile traffic to redshift, and I guess this should be also available out of the box as it is in tilequeue/tileserver under minimal initial setup)

rmarianski commented 7 years ago

In tilequeue intersect based on #186 I initially though you can feed the output of osm2psql, but when I looked at it, it actually takes the expired_tiles_location folder and derives Tile IDs from it for intersection.

We used to have the osm2pgsql update and tilequeue intersect processes run in separate cron jobs before, but now they are run from the same shellscript. The intersect takes a path and just scans for any files in that location, assuming that each are a list of expired tiles. We no longer need it that way, but it was easier to keep the configuration the same way.

How do you actually move tiles to this expired_tiles_location

The osm2pgsql output directly writes to this location. The command that we actually run is:

$OSM2PGSQL $PGOPTS --append --slim --hstore-all --cache 2048 --merc --prefix planet_osm --flat-nodes $FLATNODES --style $BASEDIR/bin/my.style --verbose --expire-tiles $EXPIRE_MIN_ZOOM-$EXPIRE_MAX_ZOOM --expire-output $EXPIRE_LOG --number-processes $NUM_PROCESSES "$CURRENT" 1>&2 2>> "$PSQLLOG"

Cause I though it definitely makes sense to move out the tiles away as fast as possible once they got expired as a result of update, so that tileserver will recompute the tiles and users will see the updates faster instead of waiting for a quite long queued pre-computation

We make the trade-off of serving stale tiles to users with lower latency, as opposed to always serving the latest, but on-demand and higher latency. What would you be thinking of doing? Eliminating the pre-generated tiles when they show up in the expiry list?

Disabling the metatiles - setting both to 0 or null will result in exception in different spots. Semantically null seems to be more preferred to 0, in which case default_queue_buffer_size will fail on arithmetics with NoneType. In this case is 128 is an intended default size for non-metatile queue? default_queue_buffer_size = max(1, 128 >> (2 * (cfg.metatile_size or 0)))

We always run with a metatile size now, so that's possible. Patches to make it work without a metatile size are welcome. And you're right, that the 128 number is meant to be a default size. Generally there the bigger the buffer size, the better off you are at picking up slack as bottlenecks shift in the system, but it takes more memory.

Adding the ability to feed the osm2psql --expired_tiles file directly to tilequeue intersect

Sure, I'd expect to have to change the intersect code to read from just a single file instead of listing a directory and iterating through each.

Removing the database-uri from toi-prune: redshift: instead will connect to the database specified in postgresql:

We use this for our pruning process, and store logs in redshift. Are you thinking of taking a similar approach to limit the scope of the tiles that get generated? If so, we can probably make multiple configurations work.

Having tileserver dumping the tile-traffic to postgres' tile_traffic_v4 for toi-prune purpose, which can be enabled from tileserver config. (since you seems to be using something from outside of tilezen stack to actually dump the tile traffic to redshift, and I guess this should be also available out of the box as it is in tilequeue/tileserver under minimal initial setup)

How are you thinking of doing this? Would you have tileserver write out an entry for each request? We have a separate system that reads from log files and writes to that table. I'd initially suggest you take a similar approach, since it seems like keeping that concern outside of tileserver would keep it more stable.

ambientlight commented 7 years ago

@rmarianski thanks for the feedback!

The intersect takes a path and just scans for any files in that location, assuming that each are a list of expired tiles.

Great, I got it. I was confused as I assumed intersect will derive TileIDs from actual tile paths. That is I thought that /all/z/x/y tile structure gets replicated there and thus I thought something actually needs to move out the tiles there. I just haven't looked into coord_ints_from_paths(expired_tile_paths), assumed the way the name says, and it kinda is like coord_ints_from_files_at_paths huh.

We make the trade-off of serving stale tiles to users with lower latency, as opposed to always serving the latest, but on-demand and higher latency. What would you be thinking of doing? Eliminating the pre-generated tiles when they show up in the expiry list?

I just was trying to guess how you guys handling it. Since I thought tiles get somehow moved out to expired-tiles - the most reasonable explanation I had is that you are moving-out/deleting the tiles in order to make sure users see the updates faster. Just I can imagine the case for an OSM update that will update the tag in some boundary that will trigger 10000k+ tiles re-render and subsequent map updates will be queued up and actually will be rerendered much later. Like user will add new buildings in iD, and only see the tile updated in like 10 minutes lets say. So deleting the invalidated tiles straight away will probably allow user to see it with 1~2 mins with a high certainty, but I guess this needs to be tailored to the application scenario maps are being used in. For a common scenario having lowest-latency possible is definitely the way to go.

We use this for our pruning process, and store logs in redshift. Are you thinking of taking a similar approach to limit the scope of the tiles that get generated? If so, we can probably make multiple configurations work.

My rationale is that I want to have the tilequeue/vector-datasource/tileserver have a TOI-prune available out of the box with the minimal configuration too. People can clone github repos, follow the README and setup the minimum development stack in minutes without going through a longer process of setting up AWS infrastructure that tilezen stack is oriented on. Like having a nice fallback for having standalone minimal tileserver/tilequeue handling the updates/TOI properly by itself. For this I am just having toi-prune code fallback to postgres configuration if redshift-url is not specified.

How are you thinking of doing this? Would you have tileserver write out an entry for each request? We have a separate system that reads from log files and writes to that table. I'd initially suggest you take a similar approach, since it seems like keeping that concern outside of tileserver would keep it more stable.

But tileserver is running standalone under initial setup, so as you suggest, should I add the new config like -> tile-traffic-log-path: and once it is configured, tileserver will log the traffic there and another thing like a shell script or tilequeue consume-tile-traffic will then parse this log and insert into tile_traffic_v4. So then the entire TOI/Updates setup is possible within the initial tilequeue/mabpox-vector-tile/vector-datasource/tileserver stack.

rmarianski commented 7 years ago

just haven't looked into coord_ints_from_paths(expired_tile_paths), assumed the way the name says, and it kinda is like coord_ints_from_files_at_paths huh.

The coord_ints is just an integer encoding to capture a z/x/y coordinate and packing it into a single 64 bit integer. This is just for efficiency purposes. So if you see that throughout the codebase that's all that's going on there.

I just was trying to guess how you guys handling it. Since I thought tiles get somehow moved out to expired-tiles - the most reasonable explanation I had is that you are moving-out/deleting the tiles in order to make sure users see the updates faster. Just I can imagine the case for an OSM update that will update the tag in some boundary that will trigger 10000k+ tiles re-render and subsequent map updates will be queued up and actually will be rerendered much later. Like user will add new buildings in iD, and only see the tile updated in like 10 minutes lets say. So deleting the invalidated tiles straight away will probably allow user to see it with 1~2 mins with a high certainty, but I guess this needs to be tailored to the application scenario maps are being used in. For a common scenario having lowest-latency possible is definitely the way to go.

Ok. The idea is just to take the expiry list from osm2pgsql, trim that down via the toi set, and enqueue those onto aws sqs. That's what the intersect process does. The queue implementation is meant to be pluggable, so you could create your own to handle whatever behavior you need. If your goal is just a self contained environment with all the parts communicating, there should be a file based queue implementation.

Like having a nice fallback for having standalone minimal tileserver/tilequeue handling the updates/TOI properly by itself. For this I am just having toi-prune code fallback to postgres configuration if redshift-url is not specified.

Makes sense to me. We have a vagrant setup which you might want to look at. Note that this is dated and not maintained, and I'd be surprised if it worked. But it was originally meant to demonstrate how to run everything in a single environment, so you can use that for inspiration.

But tileserver is running standalone under initial setup, so as you suggest, should I add the new config like -> tile-traffic-log-path: and once it is configured, tileserver will log the traffic there and another thing like a shell script or tilequeue consume-tile-traffic will then parse this log and insert into tile_traffic_v4. So then the entire TOI/Updates setup is possible within the initial tilequeue/mabpox-vector-tile/vector-datasource/tileserver stack.

Yea, if you're interested in providing a command to read the log entries and populate a database table in postgresql, that sounds fine to me.

ambientlight commented 7 years ago

Thanks a lot for an explanation. I have created #204 as a follow-up for this discussion which implemented the consume-tile-traffic and also modified prune-tiles-of-interest to work with postgres and directory store.