[Enhancement] Enabling non-parallel extraction of spectrum

BLaunet commented 6 years ago

Parallel extraction of spectrum was up-to-know automatic, but not always justified : when integrating a small number of pixels, the time spent in the parallelization process is not worth it. Moreover, parallel integration can not be further parallelized when done over many different regions, as the CPUs are already busy; one has to use a long and unefficient for loop.

A new parallel keyword in _extract_spectrum_from_region, if set to False, enables a non-parallel extraction of the spectrum, in the same manner as the parallel method. Because all the "extract" methods that can be applied to a SpectralCube are passing keyword arguments down to _extract_spectrum_from_region, parallel can be used in any of them.

Warning : Median Integration

In the parallel algorithm, the data is split per column when sent to the different CPUs. When asking for a median integration, the median is then done 'per column' and what is actually outputed is a mean of these median spectra.

The new non-parallel algorithm uses the full spatial information, and the median is actually performed accross all data at once, thus corresponding to what one can expect for a median integration.

capture d ecran 2018-01-18 a 16 33 50

thomasorb commented 6 years ago

Dear @BLaunet

A very good idea indeed ! Thank you !

But I would find it better if there was simply a default threshold (let's say PARALLEL_INTERGRATION_THRESHOLD = 100 or 1000 I don't know) on the number of integrated pixels under which your algorithm is used instead of the parallelized one.

I understand that for very good users like you it is better to have more control but it grows more complicated for the others. In a word, I would prefer avoiding the addition of more keywords as much as possible.

What do you think ?

BLaunet commented 6 years ago

I hear what you say, and I agree ! Code should stay simple, and one can always tweak it's own repo if really needed.

It looks that there is an irreducible ~1.5s when parallelizing (at least on my 8 cores local machine, it's more on the 48 cores server), and it is the time taken for the extraction of ~500 spectra.

I'm going to chane my commit in that sense, with a hard limit on 500.

BLaunet commented 6 years ago

Actually it looks like the relationship between the number of CPUs and the time to initiate the parallel process is quite simple (i'd say it takes ncpus/4 to start the server). Meanwhile, the non_parallel process is quite linear, and independant of the number of CPUs.

--> The limit should be ncpus dependant, but we need to know this number before initializing the server (otherwise the harm is done). If you see no objection, I add an import multiprocessing to use multiprocessing.cpu_count(). I don't think it is a liability, as multiprocessing is already used in ORB and thus is supposed to be installed anyway

BLaunet commented 6 years ago

To stress my point : 8_cpus 48_cpus

BLaunet commented 6 years ago

I also changed the logging to debug, so now it's transparent for the user capture d ecran 2018-01-18 a 18 40 09

thomasorb commented 6 years ago

That's an interesting point ! You can get the number of available CPUs by putting this at the init of orcs.core.HDFCube:

self.set_param('init_wcs_rotation', float(self._get_config_parameter('INIT_ANGLE'))) # already there
_, ncpus = self.init_pp_server(silent=True)
self.close_pp_server()
self.set_param('ncpus', int(ncpus))

Then you can use the self.params object wherever you need it: self.params['ncpus'] or self.params.ncpus will give you what you need :smile:

BLaunet commented 6 years ago

Right I saw this but thought 2 things :

ncpus is actually a kwargs for the Tools class init, which is the master parent class here. We could then set it when calling orb.core.HDFCube.init. To me it would make even more sense to do like this, because it would set the self.ncpus variable, which is even more straightforward. Don’t know how it would interfer with a config file though...
About the way to get the ncpus value, multiprocessing.cpu_counts() seems faster and feels more straightforward than open/close a server just for that, but that’s your call ;)

thomasorb commented 6 years ago

Good guess ! But ... the values of ncpus passed to the function orb.utils.parallel.init_pp_server() is the number of cpus the user asks for (generally 0 = all). It is a wish. But the real number of cpus granted to the user depends on the content of the files /etc/orb-kernels-wl (white list) and /etc/orb-kernels-ncpus (default number of cpus per user) if they exists. These files give the opportunity to limit the maximum number of cpus granted to a user which is not on the white list. So that the real number of cpus which will be granted can only be returned by orb.utils.parallel.init_pp_server(), though i could create a function which only returns the number of cpus without initializing the server... humm

BLaunet commented 6 years ago

Haha I didn't expect such complexity when starting this issue ;) Ok then I will set it up like you suggest ! It will simply add ncpus/4 seconds at the instantiation of a SpectralCube

BLaunet commented 6 years ago

I pushed the solution as we discussed it. I think implementing something like orb.utils.parallel.count_cpus()could be a good idea, but that's another issue ;)

thomasorb commented 6 years ago

Thank you @BLaunet !!

thomasorb / orcs

[Enhancement] Enabling non-parallel extraction of spectrum #23

Warning : Median Integration