[1.6.1] multicore limitations?

spono commented 6 years ago

Ciao JR, just a question: is there any limitation to the number of cores that can be used? Processing a catalog with cores(ctg)=12, the session opens 12 workers but I get only 4 of them working. Unfortunately I don't have any other machine with 4+ cores to test if it's something related to the (virtual) machine I'm using or to lidR code. Any hint? I'm using W10.

thanks in advance

Jean-Romain commented 6 years ago

No hard-coded limitation. I need more information to investigate.

However I do not recommend to use too many cores to speed-up your computations. 12 cores means you read 12 files at the same time. The hardware may no be able to follow and thus 12 cores can be worst than 2 cores.

So basically 12 cores may be pertinent only if the computation time is huge compared to the time needed to actually read the files. And the read time is rarely negligible. To give you an idea you can easily speed-up your computation time up to 4 times by using las instead of laz + point indexation with lax files.

bi0m3trics commented 6 years ago

I have a 16-core machine that I regularly use 10-12 cores (physical, not logical) on and I've never seen a limitation... In fact, here's what I'm running right now. On Windows, I always use opt_cores(project) <- parallel::detectCores(logical=FALSE)-4

bi0m3trics commented 6 years ago

Side note @Jean-Romain there appears to be a carriage return missing somewhere in the cluster/catalog code. You can see the output behind the task manager there... it's on my list to report.

bi0m3trics commented 6 years ago

Clarification @spono ... in 1.6.1 that would be the cores(catalog) setting... I forgot I was running 2.0.0. Adding to JR's note, I have never seen a limitation in previous versions and I have never seen an improvement in Windows using logical processors. I've found that splitting tasks among logical processors only makes things worse, especially when reading files is a major part of the workflow. If I'm going to be processing large landscapes and reading files is a component of that, then I retile prior to maximize the use of my multicore configuration.

spono commented 6 years ago

I didn't imagine that using las VS laz could affect that much the computation! Is this valid for las+lax VS laz, or VS indexed laz?

BTW, thank you both very much!

Jean-Romain commented 6 years ago

@bi0m3trics it is a Windows specific issue. The last time somebody reported this issue it was a problem with R or Rstudio update. Are you up-to-date?

@spono

laz are compressed files and need to be uncompressed on the fly. It is time consuming.
las file can be read blazingly fast but when processing a catalog you need to load buffer around your tiles. Thus, each file you read, you actually read the file + the 8 surrounding files to load a buffer
lax files and point indexation allows for reading only sub-parts of the files. You can read only part of the surrounding tiles to load a buffer. No need to actually read the full 8 files.

For example I can compute a CHM with a point-to-raster based methods (extremely fast) on 25 tiles 30 pts/m² in:

40 sec with laz 1 core
20 sec with laz + lax 1 cores
7 sec with las + lax 1 cores
no gain with 2 cores

bi0m3trics commented 6 years ago

Right or wrong, for most the uncompression time is a necessary evil on windows (storage space is cheap but slow, scratch space is expensive but fast) so I tend to use the laz + lax approach that JR outlined above and live with slower processing speeds but I also retile as needed to make sure my workflow isn't spending all my cpu time reading files... that's all I meant.

@Jean-Romain I'll check. Thanks for the reminder that windows sucks! ;)

Jean-Romain commented 6 years ago

@bi0m3trics You don't need to retile. Just need to modify the chunk size. There is no gain at reading retiled dataset if you use indexed point cloud. Or if there is some gain I'd like an explanation.

@spono Also of course the hardware under the hood is important. My computer have a SSD. New HDD are fast too. But if you use an old HDD plugged in USB 2.0 you are likely to see a fantastic drop in performance and multicore won't help.

spono commented 6 years ago

@Jean-Romain amazing, exhaustive like always!

@bi0m3trics I use the laz+lax approach as well, more on a good sense basis but never "took the time" to benchmark...i'm appreciating only now some more technical aspects and not only having the work done :) If I may, any specific reason behind the choice of parallel and not future for the identification of cores?

bi0m3trics commented 6 years ago

@spono Last I checked future just calls parallel and my use of parallel predates JR's use of future...

@Jean-Romain It's been awhile since I benchmarked it, possibly before the implementation of lax so I can't be certain, but I seem to recall that when reading my tiles as delivered by a vendor (which were big ~2sqkm per tile) lidR spent quite a bit of time reading those files, so I adopted the practice of retiling to 0.5-1sq km and notice some improvements... It's part of my QAQC/preprocess routine (denoising, normalizing, now building lax, etc) and I'd be happy to drop it. I'll shut up and look at it later. Maybe it's not necessary anymore...

Jean-Romain commented 6 years ago

A long time ago a bug in rlas implied that files were read twice before to be loaded in R and lax files were not supported. It's been a while since it is fixed (another era :floppy_disk:). If your benchmark is as old as this bug, sure it was slow :wink:

r-lidar / lidR

[1.6.1] multicore limitations? #190