openforis / sepal

Geographical Data Processing in the Cloud
https://sepal.io/
MIT License
205 stars 46 forks source link

Cannot launch MSPA on big images (> 10 Gb) #187

Open 12rambau opened 2 years ago

12rambau commented 2 years ago

Description

We are trying to run the GWB_MSPA command on a big Image and whatever the machine we use the (I use 50% of my quota experimenting on this one) the command always fails after 1-2h of computation with the following error:

XIO: fatal IO error 0 (Success) on X server ":99" after 279 requests (279 known processed) with 0 events remaining.

Searching on the web this error seems to be associated with a corruption/lack of available disk space. Which should not be the case on my SEPAL account. @pvogt tested on JRC hardware, and the computation ran just fine meaning that the issue is coming from SEPAL itself.

As a user, I think we cannot go any deeper as we don’t have access to the server's full logs preventing us from digging into what is causing this error.

Information about the computation:

Input image (I’m happy to share it with whoever wants to try):

The command :

nohup GWB_MSPA -i=/home/prambaud/input -o=/home/prambaud/output &

This command will perform the MSPA analysis in parallel. the RAM peak requirement for MSPA is ~ a factor of at least 20, which means you will need something like 180GB of free RAM when that peak hit the machine. Handled by machine > m64.

cdanielw commented 2 years ago

I seem to recall that you tried to execute this using the RAM drive. What was the results on that? If this is EFS related, I might have to look into mounting EC2 instance storage where available and/or an EBS to keep transient data during processing. That's something we have to do at some point in any case.

12rambau commented 2 years ago

I need to try it again but last time I tried I didn't manage to go all the way down the end of the computation, the program was again having write/read errors