marinebon / connectivity

larval dispersal connectivity across US national marine sanctuaries
http://marinebon.github.io/connectivity
0 stars 0 forks source link

Still getting "RuntimeError: ??? Maximum variable size allowed by the program is exceeded." #3

Open stephgad opened 7 years ago

stephgad commented 7 years ago

Hi Ben,

It looks like the Big Boy problem is still at large... Both the test we started yesterday (1000 buff/9km cell) and the new 700 buff/9km rasters gave the same "maximum variable size allowed by the program is exceeded" error when trying to run the "Run Larval Dispersal Simulation" tool. The max value for the southwest_700buf-9km_patchid.tif is only 17,435 so not sure why it's still hating on it. Going on a google quest.

max_var_error

bbest commented 7 years ago

Hi @stephgad,

Hmmm... does it only seem to work when ncell * nrow < 65,535 for the input rasters (water mask, patch id, percent cover) even though the actual max(ID) of the patch id raster is < 65,535?

We will probably need to move forward with this limitation (ie reduce buffer distance or increase cell size so ncell * nrow < 65,535) until we can enlist @jjrob's help with this.

There error is associated with this line DisperseLarvae2012.m#L356:

dispersalMatrix = zeros(length(sourceIDs), length(destIDs), numSummaries, 'single');

Note earlier at DisperseLarvae2012.m#L75:

% reefIDs - 2D matrix of integers that maps the locations of coral reefs.
% Each cell contains a reef ID between 1 and 65535, indicating that the
% specified reef occurs in that cell, or 0, indicating that no reef occurs
% in that cell.

PS Searching previous emails from Jason on DisperseLarvae2012, I see this problem:

python.exe: RuntimeError: ??? Undefined function or variable "ULfull".

and solution:

You can work around the problem by specifying a start date of 9/20/2011 rather than 9/20/2011 1:57:50 PM. Basically, just delete the time component of the date. The starting time will be 00:00:00 (midnight) when the time component is not specified.

stephgad commented 7 years ago

Hi Ben,

The southwest_700buf-9km_patchid.tif columns=209 and rows=208 (209*208 = 43,472) <65,535 but still fails. I remembered the time component issue from our GP so our test runs were already running as "date only".

Going to give doubling the cell size a try, unless you think that's too small a change?

edit: Ok, so some progress... memory_progress

stephgad commented 7 years ago

Ok, I tried 18, 27, 36, and 45km cell size for the sw/se (1000km bufffer) rasters and the 45km cell rasters FINALLY ran all the way through the four MGET tools!!!!!

bbest commented 7 years ago

Huzzah! Ok, let's get a few more parameters run for the 1000 buff / 45km cell

stephgad commented 7 years ago

The machine constance.esm.ucsb.edu has 16 G B of RAM running Windows 7 Enterprise SP1 on 64-bit OS.

jjrob commented 7 years ago

Hi Stephanie,

I’m sorry I’m not able to engage on this issue now. I have a webinar to present preliminary results of a two year project next week.

In general, there are two usual challenges with the connectivity simulator:

  1. You have to get the input files prepared just right or it will not work. (They all have to have precisely the same extent, cell size, rows and columns, coordinate system, etc.) The tool tries to catch these problems but does not always get everything. When it doesn’t there’s usually an inscrutable Matlab error, such as “matrix dimensions do not agree” or similar incomprehensible stuff.

  2. You have to find a workable balance between wanting to have a large study area, wanting to have a small cell size, wanting to run long simulations, and wanting to have a large number of reefs. Not finding a good balance results in simulations that require a lot of memory and/or require a very long run time. With the current implementation of the tool, memory is usually the bottleneck you hit first. This is because the tool is currently limited to run as a 32-bit program (owing to its integration with ArcGIS Desktop, which is 32-bit). Effectively it only can access about 1.5 to 2.0 GB of memory. Some of the matrix calculations performed by the tool involve arrays that are the number of rows x number of columns x number of summarization periods x 4 bytes. So if you have a 1000 x 1000 cell study area, that’s 4 million bytes per summarization step. If you summarize the simulation every hour, so you can have a super smooth animation, that’s roughly 100 MB per day of simulation. That will quickly blow up. So you can reduce the summarization to once per day to get it to run. But then if you have a very small cell size, you must have to have a very small time step, e.g. 5 minutes, for the advection/diffusion algorithm to be numerically stable. The number of time steps gets very large—say 300 per day—and you multiply that by the number of reefs—say you had 1000 reefs because you wanted to be super detailed—so that’s 300,000 time steps that have to be executed to run the entire simulation. If each one only takes 1 second that’s still 3-4 days to run the simulation...

So, when in doubt: reduce your spatial extent, reduce your cell size, reduce the length of the simulation (number of days / PLD), and reduce the number of reefs. I know it is a challenge to find a balance.

There may be some relief for this in the next six months or so. Eric Treml, the lead developer of the Matlab code and scientist who heads up this research, undertook a complete rewrite of the main bottleneck in the Matlab code. He now has code that will run MUCH faster than the current implementation, including a version that can use GPUs for certain operations. We are in the planning phase of putting this into MGET. As part of it, I would remove the 32-bit limitation, allowing the program to run as 64-bit and thus access all of the 16 GB of ram you have running on your machine.

This would still not eliminate the need to balance those four things (extent, cell size, simulation length, number of reefs). It would provide some relief. But by going to 64-bit, it would also allow you to attempt to start crazy simulations that are not balanced well at all. A benefit of the current 32-bit tool is that it will often fail right away if you’re inadvertently attempting something crazy…

Hope this helps,

Jason

From: Stephanie Gad [mailto:notifications@github.com] Sent: Tuesday, July 25, 2017 1:57 PM To: marinebon/connectivity connectivity@noreply.github.com Cc: Jason Roberts jason.roberts@duke.edu; Mention mention@noreply.github.com Subject: Re: [marinebon/connectivity] Still getting "RuntimeError: ??? Maximum variable size allowed by the program is exceeded." (#3)

The machine constance.esm.ucsb.edu has 16 G B of RAM running Windows 7 Enterprise SP1 on 64-bit OS.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_marinebon_connectivity_issues_3-23issuecomment-2D317817073&d=DwMCaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=cJfJ4ejc1xbg_qb47Pg1OoRq1GfFGvWbDD2PT7-fBKk&m=bPwbyiGqRhf2XkOznAkKAm8dYfEgcbxkjQnNzrVDJLQ&s=diiNLKJkKF9phU3p8eyIevNqy6XfD1WAgnCo1l03dPQ&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ALq15vHl2uxaSj5V-2DEEFY3jwHzOWDRowks5sRixvgaJpZM4Ofs1W&d=DwMCaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=cJfJ4ejc1xbg_qb47Pg1OoRq1GfFGvWbDD2PT7-fBKk&m=bPwbyiGqRhf2XkOznAkKAm8dYfEgcbxkjQnNzrVDJLQ&s=-A5KH2nraBXTHnza_jUJF293OAQXMdbpUJTw4nlk0T4&e=.

stephgad commented 7 years ago

Hi Jason,

(apologies for the edit, accidentally hit comment too soon)

Thank you for taking the time to get back to us, despite your busy schedule, we really appreciate it! Finding a workable balance between keeping our study area large and having a small cell size has definitely been a bit of a challenge!

Thank you for your recommendations, they're very helpful, I'll discuss with Ben at our next meeting.

Cheers, Stephanie