powellb / seapy

State Estimation and Analysis in Python
MIT License
28 stars 21 forks source link

The exit codes of the workers are {SIGSEGV (-11)} #67

Closed s193264 closed 2 years ago

s193264 commented 2 years ago

Hi, I'm currently using "seapy" to create boundaries, meteorology, and initial data for the ROMS model. So I was working with reference to README.md, At the stage of "4. Once you have HYCOM data, interpolate it to your grid", the following error occurs and I am in trouble.

seapy.roms.interp.to_clim("hycom_file.nc", "my_clim.nc", dest_grid=mygrid, nx=1/6, ny=1/6, vmap={"surf_el":"zeta", "water_temp":"temp", "water_u":"u", "water_v":"v", "salinity":"salt"})

Output

"joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

The exit codes of the workers are {SIGSEGV (-11)} "

I think "seapy /seapy/roms/interp.py" is involved.

Can you tell me what to do? thank you.

powellb commented 2 years ago

Hmmm. I ran interp with two different setups without issue, both with joblib 1.1.0:

joblib 1.1.0 on python 3.10.5 on macOS joblib 1.1.0 on python 3.10.4 on CentOS

The error you mention is coming from joblib. Can you try a very simple joblib test:

from joblib import Parallel, delayed
from random import random

def mythread(id):
    return [id, random()]

out = Parallel(n_jobs=2)(delayed(mythread)(i) for i in range(6))
print(out)

This will simply use two threads to put together a list of random numbers. If that works, then joblib is likely fine.

As you mention above, the OS may have killed your thread due to excessive memory usage. The interpolation (depending on the parent and child grids) can use a lot of memory, particular if you use too many threads. The way that the threads are divided with the interp is that the threads work on different times (not different subdomains of the grid). An excessive number of threads will mean that you are replicating the entire grid (at individual times) by the number of threads.

You mention that you are downscaling from HYCOM. Did you cut the HYCOM data to be a region that bounds your grid (rather than using the entire global HYCOM)? That could be one way that memory is getting too large. That is, if the error about too much memory being used is correct.

s193264 commented 2 years ago

@powellb

Thanks for your reply and advice. I'm using WSL and I'm running on Ubuntu 18.04 LTS. I tried to ran a very simple joblib test and got the following results:

`>>> from joblib import Parallel, delayed

from random import random def mythread(id): ... return [id, random()] ... out = Parallel(n_jobs=2)(delayed(mythread)(i) for i in range(6)) print(out) [[0, 0.9136347983813904], [1, 0.9136347983813904], [2, 0.9136347983813904], [3, 0.9136347983813904], [4, 0.9136347983813904], [5, 0.9136347983813904]]` As far as I can see, there seems to be no problem with joblib.

Next, regarding the cut of the area, how can I check if it is cut? I think I have set the appropriate area, but I'm not sure, so I'll list my output below. I hope you can check it. Is it possible that the grid file I prepared is strange?

>>>seapy.model.hycom.load_history("hycom_file.nc",start_time=datetime(2005,8,20),end_time=datetime(2005,9,1),grid=mygrid,epoch=default_epoch, url=_url,load_data=False) ncks -v surf_el,water_u,water_v,water_temp,salinity -d time,231,243 -d lat,1844,1895 -d lon,1094,1195 http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_19.1/2005 hycom_file.nc

What should I do? I think you are busy, sorry... thank you.

powellb commented 2 years ago

So it would appear that joblib is working properly. What version do you have?

print(joblib.__version__)

Just to be safe, you can try to update if it isn't v1.1.0; however, that isn't the issue as we have used joblib for years without change or issue.

So, the command that you run now to download your HYCOM data is presented to you as:

ncks -v surf_el,water_u,water_v,water_temp,salinity -d time,231,243 -d lat,1844,1895 -d lon,1094,1195 http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_19.1/2005 hycom_file.nc

This suggests that you are not downloading an unreasonable amount of data: only 3 time steps, for a 101x51 grid. I don't know what the size of your ROMS grid is. In general, the interpolator should be able to handle that without issue. Once you've downloaded the hycom_file.nc, then you should do something like:

seapy.interp.to_clim('hycom_file.nc', 'roms_clim.nc', dest_grid='my_roms_grid.nc',
                     nx=0.25, ny=0.25, vmap={'salinity': 'salt',
                                             'surf_el': 'zeta',
                                             'water_u': 'u',
                                             'water_v': 'v',
                                             'water_temp': 'temp'})

This will convert from the HYCOM file (with the variables names mapped as per the vmap dictionary), use a 0.25 degree decorrelation scale for the HYCOM data.

I hope that gets it working for you. I've never encountered this issue, and I am not sure what is causing it, other than the message saying that there isn't enough memory (which would suggest an issue on the computer side if the amount of data are reasonable).

s193264 commented 2 years ago

Dear @powellb When I checked the version of Joblib, it was 1.1.0. So I don't think my error is due to Joblib.

I would like to calculate ROMS in the range of latitude: 28 ° -31 ° and longitude: -92 ° --85 ° with a grid spacing of 9km. The calculation period is from August 20th to September 1st, 2005. So, I executed the calculation as follows.

import numpy as np
 import seapy
 import datetime
 from datetime import datetime
 import netCDF4
 from seapy.lib import default_epoch, chunker
 from seapy.model.grid import asgrid
 from seapy.roms import ncgen, num2date

 mygrid = seapy.model.asgrid("HYCOM_GLBu0.08_grid.nc")
 _url = "http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_19.1/2005"
 _maxrecs = 1

Next, I used "seapy.model.hycom.load_history" to download the HYCOM data as load_data = True. seapy.model.hycom.load_history("hycom_file.nc",start_time=datetime(2005,8,20),end_time=datetime(2005,9,1),grid=mygrid,epoch=default_epoch, url=_url,load_data=True)

Output:

08/20/2005-08/20/2005: surf_el water_u water_v water_temp salinity

08/21/2005-08/21/2005: surf_el water_u water_v water_temp salinity
………

09/01/2005-09/01/2005: surf_el water_u water_v water_temp salinity

Then run it again with load_data = False , and I got this output. ncks -v surf_el, water_u, water_v, water_temp, salinity -d time, 231,243 -d lat, 1844,1895 -d lon, 1094,1195 http: // tds. hycom.org/thredds/dodsC/GLBu0.08/expt_19.1/2005 hycom_file.nc

Then did the following seapy.roms.interp.to_clim("hycom_file.nc", "my_clim.nc", dest_grid=mygrid, nx=1/6, ny=1/6, vmap={"surf_el":"zeta", "water_temp":"temp", "water_u":"u", "water_v":"v", "salinity":"salt"})

However, the following error occurred.

velocity       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0% : 0:00:00
Traceback (most recent call last):
  File "$HOME/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/joblib/parallel.py", line 822, in dispatch_one_batch
    tasks = self._ready_batches.get(block=False)
  File "$HOME/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/queue.py", line 161, in get
    raise Empty
queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "$HOME/seapy/seapy/roms/interp.py", line 883, in to_clim
    pmap = __interp_grids(src_grid, destg, ncsrc, ncout, records=records, threads=threads, nx=nx, ny=ny, vmap=vmap, weight=weight, pmap=pmap)
  File "$HOME/seapy/seapy/roms/interp.py", line 390, in __interp_grids
    child_grid.mask_rho) for i in recs)
  File "$HOME/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/joblib/parallel.py", line 1043, in __call__
    if self.dispatch_one_batch(iterator):
  File "$HOME/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/joblib/parallel.py", line 833, in dispatch_one_batch
    islice = list(itertools.islice(iterator, big_batch_size))
  File "$HOME/seapy/seapy/roms/interp.py", line 390, in <genexpr>
    child_grid.mask_rho) for i in recs)
KeyError: 'water_u `

So, when I changed "water_u" to "u", "water_v" to "v", and "water_temp" to "temp", a segmentation fault was detected.

>seapy.roms.interp.to_clim("hycom_file.nc", "my_clim.nc", dest_grid=mygrid, threads=2,nx=1/6, ny=1/6,  vmap={"surf_el":"zeta", "temp":"temp", "u":"u", "v":"v", "salinity":"salt"})

temp         ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0% : 0:00:00
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "$HOME/seapy/seapy/roms/interp.py", line 883, in to_clim
    pmap = __interp_grids(src_grid, destg, ncsrc, ncout, records=records, threads=threads, nx=nx, ny=ny, vmap=vmap, weight=weight, pmap=pmap)
  File "$HOME/seapy/seapy/roms/interp.py", line 351, in __interp_grids
    for i in recs), copy=False)
  File "$HOME/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/joblib/parallel.py", line 1056, in __call__
    self.retrieve()
  File "$HOME/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/joblib/parallel.py", line 935, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "$HOME/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "$HOME/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "$HOME/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

The exit codes of the workers are {SIGSEGV(-11)}

Is there anything strange about what I've done? The memory of the PC is about 32GB, so I don't think it's running out. If you know what to do, please help. Excuse me for being long. thank you.

($HOME = /home/[User name]/)

powellb commented 2 years ago

First, a side note: you should not use the load_data=True option. That has not been maintained or checked in several years. The intent of the hycom.load_history() is to generate a standard command ncks that you can run to download your data separately. Downloading the data from opendap using standard and updated tools is the recommended way. Further to this, the fact that you are getting the wrong variable names tells me that load_data=True is not working.

I ran the ncks command you pasted above, and it downloaded properly with a region in the arctic.

The major issue is that I see is that you are using a grid called "HYCOM_GLBu0.08_grid.nc". Isn't this the global HYCOM grid? That would be why it is failing. You are trying to interpolate the Arctic onto the global HYCOM grid. It can't extrapolate.

The intent is to interpolate a HYCOM region (downloaded from the ncks) onto a smaller ROMS grid.

So, imagine your ROMS grid is called "roms_region_grid.nc", then the command would be:

seapy.roms.interp.to_clim("hycom_file.nc", "my_clim.nc", dest_grid="roms_region_grid.nc", nx=1/6, ny=1/6, vmap={"surf_el":"zeta", "water_temp":"temp", "water_u":"u", "water_v":"v", "salinity":"salt"})
s193264 commented 2 years ago

Dear @powellb

Thank you for your reply while you are busy. Thank you for a lot of advice.

As you say, HYCOM_GLBu0.08_grid.nc was certainly a global HYCOM grid. Therefore, I created a ROMS grid from the calculation results of the WRF model using MATLAB. Then, as advised, I set load_data = False and executed the obtained ncks command. ncks -v surf_el,water_u,water_v,water_temp,salinity -d time,231,243 -d lat,1357,1393 -d lon,1119,1168 http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_19.1/2005 hycom_Katrina_file.nc

As a result, "hycom_Katrina_file.nc" was created. Next, I ran the following code. seapy.roms.interp.to_clim("hycom_Katrina_file.nc", "my_Katrina_clim.nc", dest_grid="Katrina_roms_test_grid.nc", nx=1/6, ny=1/6, vmap={"surf_el":"zeta", "water_temp":"temp", "water_u":"u", "water_v":"v", "salinity":"salt"})

Then an error occurred. `/home/Users/seapy/seapy/model/grid.py:463: UserWarning: could not compute grid depths. warn("could not compute grid depths.") temp ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.0% : 0:00:00Traceback (most recent call last): File "/home/Users/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/joblib/parallel.py", line 822, in dispatch_one_batch tasks = self._ready_batches.get(block=False) File "/home/Users/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/queue.py", line 161, in get raise Empty queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "/home/Users/seapy/seapy/roms/interp.py", line 883, in to_clim pmap = interp_grids(src_grid, destg, ncsrc, ncout, records=records, threads=threads, nx=nx, ny=ny, vmap=vmap, weight=weight, pmap=pmap) File "/home/Users/seapy/seapy/roms/interp.py", line 351, in interp_grids for i in recs), copy=False) File "/home/Users/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/joblib/parallel.py", line 1043, in call if self.dispatch_one_batch(iterator): File "/home/Users/.pyenv/versions/anaconda3-5.0.1/lib/python3.6/site-packages/joblib/parallel.py", line 833, in dispatch_one_batch islice = list(itertools.islice(iterator, big_batch_size)) File "/home/Users/seapy/seapy/roms/interp.py", line 351, in for i in recs), copy=False) AttributeError: 'grid' object has no attribute 'depth_rho'`

Is this a common error? Or is it due to my actions?

If you like, please share the ROMSgrid used as a model and the output files hycom_file.nc and my_clim.nc.

Thank you very much for your polite response over and over again. Thank you for your consideration.

s193264 commented 2 years ago

Dear @powellb

hello I found that my error was due to the lack of depth_rho as it should be. However, I don't know how to set it. I am working according to the COAWST model manual (https://github.com/jcwarner-usgs/COAWST). Among them, there is a variable called rho.depth, but it is stored in rho and I'm not sure how to use it.

I would like to know how to create "depth_rho". I would appreciate it if you could let me know if there are other variables you need. thank you

powellb commented 2 years ago

As you notice, your grid has not been created as a proper ROMS grid. You will have to read about how to create a ROMS grid. There is no "depth_rho" that is stored in the grid. ROMS grids are s-level grids, which means that the grid depths change in space and in time. The "depth_rho" is calculated internally in ROMS for each time-step. For the seapy interpolation, it calculates the basic "depth_rho" from your grid file, but it isn't the exact depths of your grid for any given time (as sea-level affects the depths by centimeter scales).

I am going to close this issue because it seems that the interpolation is working, but it expects that you provide it with a proper ROMS grid to interpolate onto.

s193264 commented 2 years ago

Dear @powellb

Thank you very much for your kind support. As your advice, I would like to learn about s-level and ROMS Grid. However, I think there are various ways to create a ROMS grid. How did you do it? Also, if you like, it would be very helpful if you could share the ROMS grid created as an example. If you can't send it here, I'd appreciate it if you could send it by email.

I would like to express my gratitude to you.

powellb commented 2 years ago

You are best reading through the ROMS Documentation to understand the way that grids are defined. There are various utilities and tools available for grid construction (there is a rudimentary one in seapy that is designed for people who want to specify everything themself) that you can google around.

The best place to ask questions is in the ROMS Forum. People here can point you to various documents and grid generation tools.

s193264 commented 2 years ago

Dear @powellb

Thank you for your advice. As you said, I learned about Vertical S-coordinate in ROMS Documetation.

Therefore, I recreated the ROMS grid using Grid Builder. The contents of the grid are as follows.

>>> print (mygrid)
roms_grid_Katrina.nc
15x100x100: C-Grid with S-level
Available: I, J, _isroms, _nc, angle, cgrid, cs_r, depth_rho, depth_u, depth_v, dm, dn, eta_rho, eta_u, eta_v, f, filename, h, hc, ijinterp, key, lat_rho, lat_u, lat_v, llinterp, lm, ln, lon_rho, lon_u, lon_v, mask_rho, mask_u, mask_v, n, name, pm, pn, s_rho, shape, shape_u, shape_v, spatial_dims, tcline, theta_b, theta_s, thick_rho, thick_u, thick_v, vstretch vtransform, wtype_grid, xi_rho, xi_u, xi_v

Then I ran the following ncks command. ncks -v surf_el, water_u, water_v, water_temp, salinity -d time, 231,243 -d lat, 1320,1416 -d lon, 1081,1181 http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_19 .1/2005 hycom_file.nc

After that, I ran the following command to create the clim file.

>>> seapy.roms.interp.to_clim ("hycom_file.nc", "my_clim.nc", dest_grid = mygrid, nx = 1/6, ny = 1/6, vmap = {"surf_el": "zeta", "water_temp": "temp", "water_u": "u", "water_v": "v", "salinity": "salt"})
      velocity ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━ 100.0%: 0:00:07
{'pmaprho': array ([[613., 512., 614., ..., 511., 715., 615.],
       [614., 613., 513., ..., 612., 412., 514.],
       [615., 614., 514., ..., 715., 413., 515.],
       ...,
       [6052., 6051., 5951., ..., 5851., 5953., 5849.],
       [5952., 6052., 5953., ..., 5950., 6050., 5750.],
       [5953., 5952., 6052., ..., 5751., 5853., 5950.]),'Pmapu': array ([[613., 512., 614., ..., 511., 511., 715., 615.],,
       [614., 613., 513., ..., 612., 412., 514.],
       [615., 614., 514., ..., 715., 413., 515.],
       ...,
       [6052., 6051., 5951., ..., 5851., 5953., 5849.],
       [5952., 6052., 5953., ..., 5950., 6050., 5750.],
       [5953., 5952., 6052., ..., 5751., 5853., 5950.]),'Pmapv': array ([[613., 512., 614., ..., 511., 511., 715., 615.],,
       [614., 613., 513., ..., 612., 412., 514.],
       [615., 614., 514., ..., 715., 413., 515.],
       ...,
       [6052., 6051., 5951., ..., 5851., 5953., 5849.],
       [5952., 6052., 5953., ..., 5950., 6050., 5750.],
       [5953., 5952., 6052., ..., 5751., 5853., 5950.])}

Is it possible to say that the output is appropriate up to this point?

After that, seapy.roms.boundary.from_roms ("my_clim.nc", "my_bry.nc")

When I execute, I get the following error.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/seapy/seapy/roms/boundary.py", line 56, in from_roms
    grid = seapy.model.asgrid (roms_file)
  File "/home/users/seapy/seapy/model/grid.py", line 52, in asgrid
    return seapy.model.grid (filename = grid)
  File "/home/users/seapy/seapy/model/grid.py", line 104, in __init__
    self._verify_shape ()
  File "/home/users/seapy/seapy/model/grid.py", line 200, in _verify_shape
    "grid does not have attribute lat_rho or lon_rho")
AttributeError: grid does not have attribute lat_rho or lon_rho

However, it seems that my Grid has lat_rho and lon_rho. How should I respond? I'm sorry for your busy schedule, but thank you.

powellb commented 2 years ago

The lat and lon of the grid is crucial to defining the grid and where it is on the globe. You are going to have to find a tutorial on ROMS grids, how they are designed, how to build a grid file, etc. They are based on Arakawa C-grid, so there are three different horizontal grids.

s193264 commented 2 years ago

Dear @powellb

Thank you for your advice. I learned about roms_grid, design method, and Arakawa C-grid on the roms wiki(For example:https://www.myroms.org/wiki/Grid_Generation). Also, I think there are various ways to create a grid file. I am currently using the GridBuilder method (reference:https://www.youtube.com/watch?v=ijYDdA5ECeY&t=1509s&ab_channel=YusriYusup).

The ROMS grid has "rho" points, "u" points, "v" points, and "psi" points, and these variables are placed in the grid. Certainly, my roms grid seems to lack psi points. However, it seems that "mygrid" in the seapy README does not have those variables either.

I compared my roms grid with the variables stored in the "mygrid" in the README, but it seems that nothing is missing. Therefore, I don't understand what the above error (Attribute Error: grid does not have attribute lat_rho or lon_rho) means.

I asked the ROMS forum a few questions and tried to improve the current situation, but that didn't work either and I'm really in trouble. I'm really sorry, but please help me. thank you.

powellb commented 2 years ago

As I understand it, you are running:

seapy.roms.boundary.from_roms ("my_clim.nc", "my_bry.nc")

And it is saying that it is missing lat_rho and lon_rho. But, you didn't tell it the grid. The "my_clim.nc" file is just a climatology file. It may be lacking the grid parameters. You have to do:

seapy.roms.boundary.from_roms ("my_clim.nc", "my_bry.nc", grid="gridfile.nc")

You are going to have to look at the documentation of each of the calls to understand what they require in order to run. The issue has been closed.