openwfm / WRF-SFIRE

A coupled weather-fire forecasting model built on top of Weather Research and Forecasting (WRF). This is the original https://github.com/openwfm/wrf-fire transitioned to a fork of WRF and selected as ifire=1. Graphic log at https://repo.or.cz/git-browser/by-commit.html?r=WRF-SFIRE.git
https://wiki.openwfm.org
Other
39 stars 12 forks source link

variable subgrid_ratio_x in WPS #80

Open leydilaur opened 1 year ago

leydilaur commented 1 year ago

Good morning

We are running WRF-SFIRE in a HPC and we are having problems with the variable subgrid_ratio_x for the wps, when we run the geogrid without this variable it runs perfectly but when we incorporate the variable different errors appear.

For 1 node (128 cores) we get an out_of_memory but when we increase the number of nodes we still have error messages.

Please find attached the namelist and the *.log in case you can help us with our problem. https://drive.google.com/drive/u/1/folders/1TPJift3o7Syhj7GNVO2cGmgj8a3seCtj

Thank you very much for your help.

adamk0 commented 1 year ago

Hello, I would suggest starting by reducing the subgrid_ratio_x and subgrid_ratio_y to 1, and running WPS in a serial mode. If that works, then gradually increasing the subgrid ratio will provide the limit of what the machine can support in terms of memory.

Adam Kochanski

Associate Professor Department of Meteorology and Climate Science Wildfire Interdisciplinary Research Center San Jose State University

From: leydilaur @.> Date: Friday, September 15, 2023 at 4:47 AM To: openwfm/WRF-SFIRE @.> Cc: Subscribed @.***> Subject: [openwfm/WRF-SFIRE] variable subgrid_ratio_x in WPS (Issue #80)

Good morning

We are running WRF-SFIRE in a HPC and we are having problems with the variable subgrid_ratio_x for the wps, when we run the geogrid without this variable it runs perfectly but when we incorporate the variable different errors appear.

For 1 node (128 cores) we get an out_of_memory but when we increase the number of nodes we still have error messages.

Please find attached the namelist and the *.log in case you can help us with our problem. https://drive.google.com/drive/u/1/folders/1TPJift3o7Syhj7GNVO2cGmgj8a3seCtjhttps://www.google.com/url?q=https://drive.google.com/drive/u/1/folders/1TPJift3o7Syhj7GNVO2cGmgj8a3seCtj&source=gmail-imap&ust=1695383278000000&usg=AOvVaw0g25XwDh7WvckgO7ebVxgA

Thank you very much for your help.

— Reply to this email directly, view it on GitHubhttps://www.google.com/url?q=https://github.com/openwfm/WRF-SFIRE/issues/80&source=gmail-imap&ust=1695383278000000&usg=AOvVaw3mZY-rL4UX8DuRymdDumdt, or unsubscribehttps://www.google.com/url?q=https://github.com/notifications/unsubscribe-auth/AA6K3WNBRKKPFCWTMNAGBILX2Q563ANCNFSM6AAAAAA4ZWC6P4&source=gmail-imap&ust=1695383278000000&usg=AOvVaw0rl6ZHEoT_ciHAq9J5138u. You are receiving this because you are subscribed to this thread.Message ID: @.***>

leydilaur commented 12 months ago

Good morning,

Thanks for your reply.

I did the test to run geogrid.exe in serial mode and got it to run correctly with subgrid_ratio_x = 1,1,1,20 and subgrid_ratio_y = 1,1,1,20, but it took 01:15:24 and the WPS took 01:42:33 with geogrid.exe and ungrid.exe in serial mode but metgrid.exe in parallel.

I have two questions about this: Is it normal that it takes so long ? why in series I get a correct execution for geogrid.exe and in parallel I don't ?

Regards, Leydi Laura

adamk0 commented 12 months ago

Hello, If the fire data for domains 1, 2, and 3 have to be processed, yes this behavior will be normal. Does anyone do that? No. These flags typically should be set to zero for domains where the fire model won’t be active. So it would be subgrid_ratio_x = 0,0,0,20.

I don’t think the fire data processing in WPS was ever tested in parallel, we honestly never had any need for that.

Thanks, Adam Kochanski

Associate Professor SJSU Wildfire Interdisciplinary Research Center

From: leydilaur @.> Date: Tuesday, September 19, 2023 at 2:15 AM To: openwfm/WRF-SFIRE @.> Cc: Adam Kochanski @.>, Comment @.> Subject: Re: [openwfm/WRF-SFIRE] variable subgrid_ratio_x in WPS (Issue #80)

Good morning,

Thanks for your reply.

I did the test to run geogrid.exe in serial mode and got it to run correctly with subgrid_ratio_x = 1,1,1,20 and subgrid_ratio_y = 1,1,1,20, but it took 01:15:24 and the WPS took 01:42:33 with geogrid.exe and ungrid.exe in serial mode but metgrid.exe in parallel.

I have two questions about this: Is it normal that it takes so long ? why in series I get a correct execution for geogrid.exe and in parallel I don't ?

Regards, Leydi Laura

— Reply to this email directly, view it on GitHubhttps://www.google.com/url?q=https://github.com/openwfm/WRF-SFIRE/issues/80%23issuecomment-1725128257&source=gmail-imap&ust=1695719742000000&usg=AOvVaw1ooGmtkJnoeWQlusralPDf, or unsubscribehttps://www.google.com/url?q=https://github.com/notifications/unsubscribe-auth/AA6K3WL7UUJPJP2HEHCO5WLX3FPDZANCNFSM6AAAAAA4ZWC6P4&source=gmail-imap&ust=1695719742000000&usg=AOvVaw0wmVzdJ8kDmbsn9b5RK0Nc. You are receiving this because you commented.Message ID: @.***>

leydilaur commented 12 months ago

Good morning.

Thank you very much for your help

Yes, the fire data is processed in the 4 domains, in domains 1, 2 and 3 it runs fine and fast, it starts to take longer when it has to process this data in domain 4 which is where we want to use the fire model, but in serial mode whole domain runs fine.

Regards, Leydi Laura