Closed amandasaurus closed 6 years ago
+1 Tried to follow the code (across tilelive projects) and I couldn't understand/find the place where the value is actually used. AFAIK it's lost/ignored.
Experiencing the same issue. I'm using tilelive–bridge with tilelive.copy
to generate mbtiles from postgis.
When running mapbox studio classic, which uses tilelive under the hood afaik – 8 cores and 100% CPU is used during the entire process.
Calling tilelive.copy
with 4, 8 or 16 concurrency doesn't yield any different results. I do notice that concurrency is set here, but it doesn't seem to affect number of postgresql connections. Tilelive-bridge has it's own pooling mechanism defined here. It would be great to get feedback on how to set concurrency across the board.
Hi, it's been more than a year since I reported this issue and I haven't heard anything? Is this a valid issue? Is this the right project? I'm still seeing this problem. Is there anything I'm missing?
I'm curious about this as well.
Anyone found out more about this? Right now im digging through all the module and calls required for tile generation and havent found anything till now. Im trying to find out, if its a node problem or a general tilelive problem.
I'm hitting the same limit. With top
I see one node process and 4 postgress processes. Anyone find out what configuration is required to make use of all available resources?
@yhahn This issue have been open for quite a while. If altering concurrency is not supported, it may be better to remove the documentation and close this issue. If there is a way to configure it, it would be super if you would share it (I don't mind writing an updated readme).
Any updates?
Hi all - apologies for the lack of response here, appreciate the bump. You're right, CPU concurrency will not change in the slightest with this option. Mentioned in #99, "... a tilelive-copy operation is not forked across cpus, and within a tilelive stream, concurrency is just a control on the number of pending I/O operations with the underlying source." - in this case, it's setting the concurrency value for d3-queue.
It's unlikely we'll add a CPU concurrency option right now, so I'll update the options portion of the documentation (which needs some love #70) and make sure this is clear. Going to close, but will update here when docs have been merged.
Cheers!
Thanks for the update @mapsam. I'm a little confused why the concurrency is so arbitrarily limited. I've read a modest bit of the code and can't quite figure out where the limit of 4 cores is coming from. That's not going to work for any kind of rendering at scale.
I'm a little confused why the concurrency is so arbitrarily limited
Can you share more about where/how you're seeing this limitation? As far as I'm aware, this module doesn't set anything regarding concurrency at the system level, such as UV_THREADPOOL_SIZE
.
can't quite figure out where the limit of 4 cores is coming from
Similarly, where/how are you seeing the limit of 4 cores?
@mapsam I can't speak for @ianthetechie, but we experience the same "limit". When running tilelive copy
with something like this:
const tilelive = require('@mapbox/tilelive');
const mbtiles = require('@mapbox/mbtiles');
const bridge = require('@mapbox/tilelive-bridge');
const merge = require('tilelive-merge');
bridge.registerProtocols(tilelive);
mbtiles.registerProtocols(tilelive);
merge(tilelive);
const opts = {
type: 'pyramid',
minzoom: min,
maxzoom: max,
retry: 50,
close: false,
slow: 60000,
timeout: 6000000,
job: {
total: totalJobs,
num: currentJob,
},
bounds,
}
tilelive.copy(sourceUri, destinationUri, opts, err => {
if (err) {
error(err);
reject(err);
}
resolve('Complete');
});
Where the source is a tm2source XML
with a postgis
backend and the destination is sqlite mbtiles
we only see 4 connections to postGIS and 4 instances of mapnik running on a system with 12 cores.
This "limit" may not be coming from tilelive, but we've yet to be able to find it and override it. Any idea why this happens?
I've experienced the same when using this docker image (run as recommended, more or less): https://github.com/openmaptiles/generate-vectortiles. Performance is so terrible relative to what my machine can do that I'm presently working on a python script to do my renders instead. My initial testing so far (aside from some issues with buffering labels, which only affects raster adversely) performs significantly better, as it keeps the system at full utilization.
Thanks for the info- it's helpful to know which tilelive plugins you're using since this module only connects the interfaces. Upon quick glance it looks like this could be fixed in the tilelive-bridge & tilelive-mapnik modules, which both use mapnik-pool (which sets threadpool size based on number of cpus here). Each module sets the size differently:
UV_THREADPOOL_SIZE
in the environment, otherwise defaults to require('os').cpus().length
cpus() * 2
and is not configurable it seems. Would gladly accept a PR that first tries process.env.UV_THREADPOOL_SIZE
and backs up to making a best guess.My suggestion would be to test changing that number in whichever way makes sense for you to see if that helps speed things up for you. @ianthetechie Glad to hear you're working on something better, feel free to share your work or backport fixes in the tilelive modules!
@mapsam thanks so much for going out of your way to look into this. I'll give this a shot this AM (Korean time) and report back!
@mapsam you are a gentlemen and a scholar. tilelive-mapnik is indeed what said docker image uses, and as you observed, it will respect process.env.UV_THREADPOOL_SIZE. I'm not quite sure where the limit of 4 came from as I'm on an 8 CPU machine. At any rate, it's a rather simple docker config flag. Cheers!
Great news @ianthetechie! Perhaps there are some unknown caveats to os.cpus()
in Node.js that resulted in the different numbers. Appreciate you reporting back. A next step for me is documenting this ... somewhere 🤔 (not exactly sure where to be honest).
Yeah... Sorry about my bad assumption earlier. I'm a bit new to hacking around with tilelive and, despite the massive list of plugins in the README, I never quite connected the dots there. GitHub isn't super conducive to detailed docs, but I've seen it done externally on readthedocs, or even in a docs folder with linked markdown files. I understand you probably don't have a huge time budget for documentation, so thanks again for going out of the way to follow up on this.
@ianthetechie Did you ever get this working for generate-vectortiles
?
Yes @nnhubbard, the trick was to set the UV_THREADPOOL_SIZE
environment variable (in docker-compose.yml in the case that you're using a docker image like this).
Yes @nnhubbard, the trick was to set the
UV_THREADPOOL_SIZE
environment variable (in docker-compose.yml in the case that you're using a docker image like this).
I did that, but I didn't see any drastic change in the time it took. Did you see some major speed increases?
We saw the CPU utilization improve considerably, and we did see speed pick up a lot. However, that was almost a year ago and we've moved away from the vanilla OMT render process, as we found a lot of issues with tilelive-copy
and we've diverged a bit from the schema as well. Factors I can think of offhand that might affect your render speed are number of CPU cores and disk speed. We found setting the concurrency in tilelive above nCPUs didn't help much.
Hopefully some of this helps with your setup :)
We saw the CPU utilization improve considerably, and we did see speed pick up a lot. However, that was almost a year ago and we've moved away from the vanilla OMT render process, as we found a lot of issues with
tilelive-copy
and we've diverged a bit from the schema as well. Factors I can think of offhand that might affect your render speed are number of CPU cores and disk speed. We found setting the concurrency in tilelive above nCPUs didn't help much.Hopefully some of this helps with your setup :)
Thank you! Are still using OMT and have found a faster solution than tilelive-copy
when creating your mbtiles files?
We are rendering vector tiles using PostGIS directly with the ST_AsMVT function. You can see an example of how to do this in https://github.com/openmaptiles/postserve. Note that there are a myriad of issues with this particular project and it is not ready for production use, but we took a similar approach for our own system and this should give what you need SQL-wise.
We are rendering vector tiles using PostGIS directly with the ST_AsMVT function. You can see an example of how to do this in https://github.com/openmaptiles/postserve. Note that there are a myriad of issues with this particular project and it is not ready for production use, but we took a similar approach for our own system and this should give what you need SQL-wise.
Ah, so you aren't rendering out full mbtiles files, but rather hosting vector tiles? Is there somewhere else I could ask you a few more questions if you have time?
Correct. Though mbtiles is just a sqlite database that follows a particular schema, so it's not hard to make that jump if you want to. Send me an email at ian@stadiamaps.com if you want to discuss further.
It appears (to me) that the
--concurrency
argument totilelive-copy
doesn't have any affect. No matter what value I put intop
shows that 4 node processes are running, regardless if I'm using a machine with 4 CPU cores (my desktop) or a server with 8. Regardless of the--concurrency
value, I get approximately the same throughput when doing a copy. Changing--concurrency
to ten times it's old value doesn't appear to increase (or decrease) the throughput.Is there something I'm missing to get increased concurrency?