mdbartos / pysheds

:earth_americas: Simple and fast watershed delineation in python.
GNU General Public License v3.0
706 stars 191 forks source link

Is it possible to parallelize the watershed delineation process for multiple coordinates? #127

Open saikirankuntla opened 4 years ago

saikirankuntla commented 4 years ago

I am interested in delineating watersheds for the given millions of outlets/pour points coordinates using pysheds. When I run my script that includes a watershed delineation from flow direction map, for one set of coordinates it is taking roughly 10 seconds for the whole process. With this speed, if I loop the code for say 10 million coordinates it takes days.

Kindly let me know if it is possible to parallelize or speed up the iterating process with pysheds in python. Is it possible to make the system to utilize multiprocessing or use all the threads, cores, and RAM memory while running a pysheds script to speed up?

Any help in this regard is much appreciated.

yonaskd commented 3 years ago

saikirankuntla I am having a similar issue. Want to do a lot of watersheds based on different pour points. This is taking days to complete the task. Did you able to use multiprocessing? Since we have unix cluster, i am trying to parallelize pysheds for the cluster. Thanks

bmalbusca commented 3 years ago

saikirankuntla I am having a similar issue. Want to do a lot of watersheds based on different pour points. This is taking days to complete the task. Did you able to use multiprocessing? Since we have unix cluster, i am trying to parallelize pysheds for the cluster. Thanks

@saikirankuntla any results?

MauKruisheer commented 2 years ago

Same here: I would like to calculate the Flow Length in multiple catchments instead of just one. I reactivated a thread on this on StackExhange, as I have the feeling a lot of people would be interested in this too: https://gis.stackexchange.com/questions/380448/how-to-create-a-raster-delineating-all-drainage-basins-using-pysheds-python

mdbartos commented 2 years ago

Greetings,

Pysheds v0.3 (released earlier this week) uses numba to greatly increase the speed and also enable parallelism in some operations. I'd highly recommend upgrading. (Some operations are 10-100x faster in the new version).

Because catchment delineation is recursive, it is difficult to truly parallelize. I'd recommend splitting up your job into multiple instances or processes (e.g. distribute jobs over multiple cloud instances).

yonaskd commented 2 years ago

Thank you Matt! This will be very helpful. I have already updated the package to v0.3 and looking forward to see how it perform on HPC. Will share my experience afterward. Thanks again.

Yonas

mdbartos commented 2 years ago

@yonaskd: One quick thing. I just noticed that some of the recursive functions are causing problems on some computers (i.e. killing the kernel due to memory consumption, etc.). I'm writing equivalent iterative versions for all functions in the numba branch and will push the fix to version v0.3.1 within a day or so.

yonaskd commented 2 years ago

Thank you Matt, I will wait for the new version then. Best. Yonas

mdbartos commented 2 years ago

@yonaskd : Iterative versions of recursive hydrologic functions have been added in v0.3.1, released today.

yonaskd commented 2 years ago

Thank you, Matt!

I will update to 0.3.1 then. One question i have, did the new version replaced the grid flow_distance function by distance_to_outlet. my code does not recognize the flow_distance function anymore and the result from distance_to_outlet is not right. I am checking other things to make sure i haven't missed anything. Also, where do we check for the read me to read about the changes associated with the new version of the package. Thanks again for your time and for your very valuable package.

Best,

Yonas


From: Matt Bartos @.> Sent: Wednesday, January 12, 2022 10:56 PM To: mdbartos/pysheds @.> Cc: yonaskd @.>; Mention @.> Subject: Re: [mdbartos/pysheds] Is it possible to parallelize the watershed delineation process for multiple coordinates? (#127)

@yonaskdhttps://github.com/yonaskd : Iterative versions of recursive hydrologic functions have been added in v0.3.1, released today.

— Reply to this email directly, view it on GitHubhttps://github.com/mdbartos/pysheds/issues/127#issuecomment-1011796305, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARIO72MAA6DI3MKVLSJUIVLUVZLQVANCNFSM4PRZGHXA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.Message ID: @.***>

mdbartos commented 2 years ago

Hi Yonas,

Thanks so much. Let me know if you want me to take a look at the dataset you're seeing issues with. For me, the distance_to_outlet function is returning results consistent with flow_distance for my test data.

A CHANGES file is on my radar #171. Apart from the numba acceleration, the major breaking changes are that all hydrologic functions now return Raster objects, and datasets as named parameters of grid were removed. An (incomplete) list of changes can be seen in #162 (v0.3) and #170 (v0.3.1).

yonaskd commented 2 years ago

Hi Matt,

Great, thanks! The function now works (may be even a bit faster) after i have changed the flow direction and the catchment functions based on your recent example for v0.3. I am currently using v0.3.1, and everything looks great. This must be related to the newer version of pysheds returning a Raster than a variable of the grid. Yes, having a "change file" describing all the necessary changes associated with the updated versions will help considerably. Thank you again for your supper helps.

Best,

Yonas


From: Matt Bartos @.> Sent: Thursday, January 13, 2022 5:32 PM To: mdbartos/pysheds @.> Cc: yonaskd @.>; Mention @.> Subject: Re: [mdbartos/pysheds] Is it possible to parallelize the watershed delineation process for multiple coordinates? (#127)

Hi Yonas,

Thanks so much. Let me know if you want me to take a look at the dataset you're seeing issues with. For me, the distance_to_outlet function is returning results consistent with flow_distance for my test data.

A CHANGES file is on my radar #171https://github.com/mdbartos/pysheds/issues/171. Apart from the numba acceleration, the major breaking changes are that all hydrologic functions now return Raster objects, and datasets as named parameters of grid were removed. An (incomplete) list of changes can be seen in #162https://github.com/mdbartos/pysheds/pull/162 (v0.3) and #170https://github.com/mdbartos/pysheds/pull/170 (v0.3.1).

— Reply to this email directly, view it on GitHubhttps://github.com/mdbartos/pysheds/issues/127#issuecomment-1012613728, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARIO72ILR2NQ63MXTJFRNVLUV5OHDANCNFSM4PRZGHXA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.Message ID: @.***>