malcolmw / pykonal

Travel-time calculator based on the fast-marching method solution to the Eikonal equation.
https://malcolmw.github.io/pykonal-docs/
GNU General Public License v3.0
154 stars 55 forks source link

NaN gradients in traveltime.trace_ray #31

Closed filefolder closed 1 year ago

filefolder commented 1 year ago

Hi Malcolm,

I am struggling with an onset case of "Encountered NaN gradient" when using pyvorotomo, but I think this is happening in pykonal somewhere. Have you experienced this or know what the issue may be? I was initially thinking that some of my event-station distances were too close, and indeed sometimes very close arrivals this generated None raypaths, but removing these hasn't solved the issue. I don't have issues with other regions/velocity models so I'm guessing that's the culprit, but I don't see anything obviously wrong with it or understand how that would matter anyway.

Here is the error... seems to track back to line 639 of fields.pyx ( gg = np.moveaxis(np.stack([g0, g1, g2]), 0, -1) ) but again, I am at a loss to understand why. I am using a relatively new numpy 1.24.1 ...?

2023044T21:32:28::INFO::0000:: Updating P-wave model
2023044T21:32:28::INFO::0000:: Realization #1 (/500) with hvr = 8.0.
2023044T21:32:28::INFO::0000:: Tracing rays.
2023044T21:33:57::ERROR::0004:: _trace_rays() raised <class 'ValueError'>: Encountered NaN gradient.
2023044T21:33:57::ERROR::0004:: iterate() raised <class 'ValueError'>: Encountered NaN gradient.
2023044T21:33:57::ERROR::0004:: main() raised <class 'ValueError'>: Encountered NaN gradient.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 4 in communicator MPI_COMM_WORLD
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
2023044T21:33:57::ERROR::0001:: _trace_rays() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0001:: iterate() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0001:: main() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0002:: _trace_rays() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0002:: iterate() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0002:: main() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0003:: _trace_rays() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0003:: iterate() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0003:: main() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0005:: _trace_rays() raised <class 'SystemError'>: Interrupting signal received... aborting
--- Logging error ---
Traceback (most recent call last):
  File "/home/seisop/.local/lib/python3.11/site-packages/pyvorotomo/_utilities.py", line 87, in _decorated_func
    return (func(*args, **kwargs))
            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/seisop/.local/lib/python3.11/site-packages/pyvorotomo/_iterator.py", line 1206, in iterate
    self._trace_rays(phase)
  File "/home/seisop/.local/lib/python3.11/site-packages/pyvorotomo/_utilities.py", line 92, in _decorated_func
    raise (exc)
  File "/home/seisop/.local/lib/python3.11/site-packages/pyvorotomo/_utilities.py", line 87, in _decorated_func
    return (func(*args, **kwargs))
            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/seisop/.local/lib/python3.11/site-packages/pyvorotomo/_iterator.py", line 790, in _trace_rays
    raypath = traveltime.trace_ray(coords) # <<<<<< something is happening HERE
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pykonal/fields.pyx", line 398, in pykonal.fields.ScalarField3D.trace_ray
  File "pykonal/fields.pyx", line 431, in pykonal.fields.ScalarField3D.trace_ray
  File "pykonal/fields.pyx", line 349, in pykonal.fields.ScalarField3D.gradient.__get__
  File "pykonal/fields.pyx", line 639, in pykonal.fields.ScalarField3D._gradient_of_spherical
  File "<__array_function__ internals>", line 200, in stack
  File "/home/seisop/.local/lib/python3.11/site-packages/numpy/core/shape_base.py", line 471, in stack
    return _nx.concatenate(expanded_arrays, axis=axis, out=out,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<__array_function__ internals>", line 200, in concatenate
  File "/home/seisop/.local/lib/python3.11/site-packages/pyvorotomo/_utilities.py", line 398, in signal_handler
    raise (SystemError("Interrupting signal received... aborting"))
SystemError: Interrupting signal received... aborting

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/logging/__init__.py", line 1110, in emit
    msg = self.format(record)
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/logging/__init__.py", line 953, in format
    return fmt.format(record)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/logging/__init__.py", line 690, in format
    s = self.formatMessage(record)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/logging/__init__.py", line 658, in formatMessage
    def formatMessage(self, record):

  File "/home/seisop/.local/lib/python3.11/site-packages/pyvorotomo/_utilities.py", line 398, in signal_handler
    raise (SystemError("Interrupting signal received... aborting"))
SystemError: Interrupting signal received... aborting
Call stack:
  File "/home/seisop/.local/bin/pyvorotomo", line 109, in <module>
    main(argc)
  File "/home/seisop/.local/lib/python3.11/site-packages/pyvorotomo/_utilities.py", line 87, in _decorated_func
    return (func(*args, **kwargs))
  File "/home/seisop/.local/bin/pyvorotomo", line 85, in main
    inversion_iterator.iterate()
  File "/home/seisop/.local/lib/python3.11/site-packages/pyvorotomo/_utilities.py", line 89, in _decorated_func
    logger.error(
Message: "iterate() raised <class 'SystemError'>: Interrupting signal received... aborting"
Arguments: ()
2023044T21:33:57::ERROR::0005:: iterate() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0005:: main() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0006:: _trace_rays() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0006:: iterate() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0006:: main() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0007:: _trace_rays() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0007:: iterate() raised <class 'SystemError'>: Interrupting signal received... aborting
2023044T21:33:57::ERROR::0007:: main() raised <class 'SystemError'>: Interrupting signal received... aborting
[sekiya:4070252] 6 more processes have sent help message help-mpi-api.txt / mpi-abort
[sekiya:4070252] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Any insights appreciated... even if there's a way to failsafe this functions against NaNs or divide by zeros that I am missing.

malcolmw commented 1 year ago

Hi Robert,

A couple of questions:

(1) Which version of PyVoroTomo are you using? Hongjian and I have slightly different versions, and his should probably be the authoritative version. (2) What is your starting model? And can you reproduce the error in a minimal example?

I am suspecting that this is an edge-case phenomenon, in which a ray is propagating along the boundary of the model where there isn't enough information to define a proper 3-D gradient vector. This can happen, for example, when you have a ray propagating between a surface source and surface receiver in a homogeneous velocity model.

filefolder commented 1 year ago

I'm using the PyVoro currently on github, master branch, I think the last commit was March 2021. If there is a newer version I would be keen to try it as I use it quite often.

Just got this error this morning on a different model, it's a new one to me, and because I have recently updated PyKonal it may be the result of some new behaviour or change in traveltime_inventory? (edit: yeah looks like that was this https://github.com/malcolmw/pykonal/commit/946cd139eaf1d954018f44b48715b193b4048b6c)

2023045T01:04:27::INFO::0000:: Relocating events.
2023045T01:04:27::ERROR::0001:: _relocate_events_de() raised <class 'TypeError'>: Argument 'traveltime_inventory' has incorrect type (expected str, got dict)
2023045T01:04:27::ERROR::0001:: relocate_events() raised <class 'TypeError'>: Argument 'traveltime_inventory' has incorrect type (expected str, got dict)
2023045T01:04:27::ERROR::0001:: iterate() raised <class 'TypeError'>: Argument 'traveltime_inventory' has incorrect type (expected str, got dict)
2023045T01:04:27::ERROR::0001:: main() raised <class 'TypeError'>: Argument 'traveltime_inventory' has incorrect type (expected str, got dict)

I will keep trying to isolate the issue, maybe widen the model or remove shallow events. I might also downgrade pykonal on the chance it's a version issue.. which hadn't occurred to me until just now.

malcolmw commented 1 year ago

@filefolder, the new error is a quick fix. I did update the TraveltimeInventory class initializer: It no longer needs the station_dict argument. You just need to provide the path argument.

malcolmw commented 1 year ago

Regarding PyVoroTomo, I believe you are using the most recent version, unless Hongjian has made any changes that aren't reflected in his public repository.

You might try wrapping Line 789 in a try except block to print the offending event, station pair information so you can recreate a minimal example. Then I should be able to help you debug.

filefolder commented 1 year ago

Thanks for the replies!

I seem to have managed to get this working on an older machine, so the model and data are likely OK. I think what happened was that the velocity models were built on slightly different versions of PyKonal as well as on different versions of Python (e.g. 3.8 vs 3.11), and that was causing inconsistencies... somewhere? I have no good idea why, but it (hopefully) explains a lot of other strange glitches and random crashes as well that happened too infrequently to pin down. Since restricting everything to the same versions and py3.8 things seem to be going smoothly but will keep an eye on it.

FWIW to anyone else, I've made a branch of PVT here https://github.com/filefolder/PyVoroTomo that I think (haven't confirmed) fixes most of the outstanding issues and should run with 0.2.3.b4. I'll send a PR through eventually.

malcolmw commented 1 year ago

OK, thanks for the feedback. I will keep an eye out for issues with Python 3.11. Looking forward to your PR.