Closed falkamelung closed 4 years ago
Hello @2gotgrossman @yunjunz ,
I have a question related with dask enabling on local systems. I enabled dask parallelization in my template and ran into some problems in ifgram inversion. First I needed to add cores and memory to line 1131 in ifgram_inversion.py
cluster = LSFCluster(walltime=inps.walltime, python=python_executable_location,cores=1,memory='5 GB')
but after that dask tried to create and use LSF jobs (which is expected with use of LSFCluster). I was wondering if there is a dask for local option which hasn't been implemented yet?
Hi @ehavazli, I don't have an answer to your question. @2gotgrossman is more suitable to answer it but he won't be at work anytime soon. All I know is that PBS scheduler can be easily added, not sure about all the other options.
I am using it a lot and it works fine for me as it is. The defaults are set in the ./MintPy/mintpy/defaults/dask.yaml I don’t understand why you run into problems.
I am using the option below for larger runs //login3/projects/scratch/insarlab/famelung/KilaueaSenAT124[1086] grep wall /nethome/famelung/insarlab/OPERATIONS/TEMPLATES/KilaueaSenAT124.template mintpy.networkInversion.walltime = 03:00
It will be nice to get it to work under PBS. Should be straightforward. I suspect on most systems you don’t have to worry about the walltime and just use a long one. On pegasus, sometime you get earlier into the queue with short walltimes.
Falk Amelung Professor Department of Marine Geosciences Rosenstiel School of Marine and Atmospheric Sciences University of Miami 4600 Rickenbacker Causeway Miami, FL 33149 USA Tel: 305 421 4949 E-mail: famelung@rsmas.miami.edumailto:famelung@rsmas.miami.edu Web: http://insar.rsmas.miami.edu InSAR data: http://insarmaps.miami.edu
On Jun 25, 2019, at 8:36 PM, ehavazli notifications@github.com<mailto:notifications@github.com> wrote:
Hello @2gotgrossmanhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F2gotgrossman&data=02%7C01%7Cfamelung%40rsmas.miami.edu%7C4802876311ee4a87712808d6f99c0526%7C2a144b72f23942d48c0e6f0f17c48e33%7C0%7C0%7C636970845837183388&sdata=eaKWZC8z7vS23CF22GgYy9ijOxpgYdKeSBn5xeBmZv4%3D&reserved=0 @yunjunzhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyunjunz&data=02%7C01%7Cfamelung%40rsmas.miami.edu%7C4802876311ee4a87712808d6f99c0526%7C2a144b72f23942d48c0e6f0f17c48e33%7C0%7C0%7C636970845837193380&sdata=mOVCGRSW%2FhwsjMaJ%2Bmj3a38yJZJaKcn4x%2FLuhXJExc8%3D&reserved=0 ,
I have a question related with dask enabling on local systems. I enabled dask parallelization in my template and ran into some problems in ifgram inversion. First I needed to add cores and memory to line 1131 in ifgram_inversion.py
cluster = LSFCluster(walltime=inps.walltime, python=python_executable_location,cores=1,memory='5 GB')
but after that dask tried to create and use LSF jobs (which is expected with use of LSFCluster). I was wondering if there is a dask for local option which hasn't been implemented yet?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Finsarlab%2FMintPy%2Fissues%2F73%3Femail_source%3Dnotifications%26email_token%3DACVFHXCZPGRVPTQTPEOV2BTP4JQSLA5CNFSM4HSE37GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYRF6IY%23issuecomment-505569059&data=02%7C01%7Cfamelung%40rsmas.miami.edu%7C4802876311ee4a87712808d6f99c0526%7C2a144b72f23942d48c0e6f0f17c48e33%7C0%7C0%7C636970845837193380&sdata=HYeVbUEF4LO03mE9bxv6k1j2fvM4BWM%2B1N37b4DKgdo%3D&reserved=0, or mute the threadhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACVFHXBRKK35TRNJVRIV7ALP4JQSLANCNFSM4HSE37GA&data=02%7C01%7Cfamelung%40rsmas.miami.edu%7C4802876311ee4a87712808d6f99c0526%7C2a144b72f23942d48c0e6f0f17c48e33%7C0%7C0%7C636970845837203374&sdata=F9vkvPhUQ4nTroJprCmVRBdInIBjAGlvjbl491%2FtWyc%3D&reserved=0.
@falkamelung
I am running a test case on my laptop with the following options in my template:
## Parallel processing with Dask for HPC
mintpy.networkInversion.parallel = yes #[yes / no], auto for no, parallel processing using dask
mintpy.networkInversion.numWorker = 4 #[int > 0], auto for 40, number of works for dask cluster to use
mintpy.networkInversion.walltime = 03:00 #[HH:MM], auto for 00:40, walltime for dask workers
when I run smallbaselineApp.py I receive the error below:
split 13372 lines into 4 patches for processing
with each patch up to 4000 lines
reference pixel in y/x: (10651, 3739) from dataset: unwrapPhase
Traceback (most recent call last):
File "/Users/havazli/MintPy/mintpy/smallbaselineApp.py", line 1067, in <module>
main()
File "/Users/havazli/MintPy/mintpy/smallbaselineApp.py", line 1057, in main
app.run(steps=inps.runSteps, plot=inps.plot)
File "/Users/havazli/MintPy/mintpy/smallbaselineApp.py", line 1001, in run
self.run_network_inversion(sname)
File "/Users/havazli/MintPy/mintpy/smallbaselineApp.py", line 508, in run_network_inversion
mintpy.ifgram_inversion.main(scp_args.split())
File "/Users/havazli/MintPy/mintpy/ifgram_inversion.py", line 1258, in main
ifgram_inversion(inps.ifgramStackFile, inps)
File "/Users/havazli/MintPy/mintpy/ifgram_inversion.py", line 1131, in ifgram_inversion
cluster = LSFCluster(walltime=inps.walltime, python=python_executable_location)
File "/Users/havazli/miniconda3/envs/ts/lib/python3.7/site-packages/dask_jobqueue/lsf.py", line 89, in __init__
super(LSFCluster, self).__init__(config_name=config_name, **kwargs)
File "/Users/havazli/miniconda3/envs/ts/lib/python3.7/site-packages/dask_jobqueue/core.py", line 231, in __init__
"You must specify how many cores to use per job like ``cores=8``"
ValueError: You must specify how many cores to use per job like ``cores=8``
Did you run into this problem before or am I doing something wrong? I would like to activate dask on my local machine (MacOS) right now and don't have a PBS system.
I never tried on a Mac. The testdata that work for me with minsar on pegasus is the file below. I never used the numworker option as a good one is somewhere hardwired in the defaults. If it does not work for you I could run it using your/mine or somebody else account.
https://github.com/geodesymiami/rsmas_insar/blob/master/samples/unittestGalapagosSenDT128.template
@ehavazli This is old now, but I have been working on Dask for a bit and have been updating MintPy so as to be more flexible in how it performs its parallel computations. Are you still encountering this problem? I can take a look into how to make MintPy run in parallel locally, as that would be a useful default option for a lot of people that don't have access to an HPC cluster.
Hi @Ovec8hkin, NumPy and scipy with open-blas binding could usually take advantage of multiple cores on local laptop, which is installed by default using conda or macports. In my case, the ifgram_inversion.py use ~%400 of CPU. Therefore, I don't see a significant gain from parallel locally and believe this should be a low priority for now.
This is something we should look at. A regular mintpy run on Stampede uses only 1 core. But maybe, with some different conda install options we can make it use multiple cores? This would accelerate other portions of the workflow where dask has not yet been implemented (topo residual estimation).
On Apr 4, 2020, at 7:16 PM, Zhang Yunjun notifications@github.com<mailto:notifications@github.com> wrote:
Hi @Ovec8hkinhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FOvec8hkin&data=02%7C01%7Cfamelung%40rsmas.miami.edu%7Cb6cd6139c9664fa7307a08d7d8ee23e4%7C2a144b72f23942d48c0e6f0f17c48e33%7C0%7C0%7C637216389632246186&sdata=KjUdeaI2wgut0hdQk6c2WUCA9yC5r2ju5d26Hky6J8A%3D&reserved=0, NumPy and scipy with open-blas binding could usually take advantage of multiple cores on local laptop, which is installed by default using conda or macports. In my case, the ifgram_inversion.py use ~%400 of CPU. Therefore, I don't see a significant gain from parallel locally and believe this should be a low priority for now.
Hi @falkamelung, to followup on the main topic of this issue, is it still current?
part of this is discussed in a new issue
Hi David, After the dask task I have added a function to move the 40 worker*.e and worker.o files into separate directories as below. However, after pysar is completed there are still some workero file in the pysar directory. It appears that not *.e files exist when
ut.move_dask_stdout_stderr_files()
is run. Is there a command to wait until all dask workers are completed? I triedfuture.result()
or similar but that did not solve the problem.