Running NLLoc with input time grids located in several directories.

codeelw commented 5 months ago

Hi Anthony,

I am using a 1km spaced grid which covers all of New Zealand. Due to the large size of this grid (22Tb) I have split the time grids across three directories and then created symbolic links to the three directories within the directory I am running NonLinLoc in. I was wondering how it would be best to approach locating events with the grids split across the directories. Is there a way to scan across the directories recursively?

Currently my directories are set up as follows: TIME1 -> /Volumes/GeoPhysics_42/users-data/williaco/1_km_grids_ABAZ_to_095A/TIME TIME2 -> /Volumes/GeoPhysics_43/users-data/williaco/1_km_grids_095B_to_LRAN/TIME TIME3 -> /Volumes/GeoPhysics_36/users-data/williaco/1km_grids/TIME

with all of these symbolic links being stored in /Volumes/GeoPhysics_35/users-data/williaco/NonLinLoc_1km_grid_run_directory/TIME.

I currently have my LOCFILES line of the control file set up as follows: LOCFILES IN/1920935.nll NLLOC_OBS TIME/TIME1/NZ_3D OUT/located

However this only searches one directory, is there a way to adjust this line in the control file to search through TIME1, TIME2 and TIME3 at the same time? I have tried a number of combinations of wildcard in place of the TIME1 part with no success. Alternatively is there a way to run this part of the control file recursively?

Many thanks,

Codee

alomax commented 5 months ago

Hi Codee,

NLL grid i/o is coded to see one buffer file and one corresponding header file. I do not see any way to change this to support scanning a single grid in different file pairs without some very complex, low level (and difficult to maintain?) C code changes and additions to NLL.

So I think the solution has to be implemented at the OS or higher level, so that NLL "sees" a single file, even if it is distributed on different physical or logical devices.

It looks like one quite simple solution would be an array of RAID 0 disks. You may be able to set up one logical disk from the point of view of the OS which is stored on several physical disks under RAID 0. See: https://www.intel.com/content/www/us/en/support/articles/000005867/technologies.html#raid0 This will also increase speed over using one disk.

I hope this helps, otherwise we can think about other options...

Best regards,

Anthony

trap000d commented 5 months ago

Thanks Anthony, it makes sense. Unfortunately (Codee has not mentioned it), all these volumes are network shares, so I guess RAID won't work. I've just tried some combination of 'mount --bind' + 'mount -t overlay' - that seems working for NFS. Hopefully 3 extra layers of mount abstractions will not cause a massive performance degradation.

alomax commented 5 months ago

OK, this is interesting, please tell me if it works in practice! I seem to find on an iMac that NLLoc is fastest when the travel-time grids are all in memory, even with a local, SSD disk. But maybe in a more sophisticated network/server environment disk access (and caching) is very fast.

For all of New Zealand, a 1km grid sounds very fine - the velocity model must be much smoother in many areas, especially at depth. But I suppose there is 1km (or less) detail at shallow depths. One solution to this situation is a new feature called Cascading Grids in NLL - the travel-times are calculated in full resolution for precision, but the resulting travel-time grids are stored with increasing cell size with depth. See this paper where this procedure is introduced and described for all Italy (~1200x1200x800km):

(Latorre, D., Di Stefano, R., Castello, B., Michele, M., & Chiaraluce, L. (2023). An updated view of the Italian seismicity from probabilistic location in 3D velocity models: The 1981–2018 Italian catalog of absolute earthquake locations (CLASS). Tectonophysics, 846, 229664. https://doi.org/10.1016/j.tecto.2022.229664)

ut-beg-texnet / NonLinLoc

Running NLLoc with input time grids located in several directories. #46