i/o not recognizing travel time files

johnbdesanto commented 5 months ago

I've just tried processing some data that we are collecting in New Zealand currently and got an error running gnatss where the program failed to load any travel time files. I have double-checked that I have the correct file path in the config.yaml file.

We recently updated the os on this machine, so I suspect that some python package or another may have been made obsolete. Where would be a good place to start troubleshooting?

For reference, here is the output from gnatss as well as an ls command showing that the pxp_tt file exists:

(gipsyx) [jdesanto@geod-proc01-prd-a056 GNATSS_center]$ gnatss run --extract-dist-center --extract-process-dataset --qc --distance-limit 150 --residual-limit 1000
Loading configuration ...
Configuration loaded.
Gathering sound_speed at /wd4/GPSA_PROCESSING/New_Zealand/2024_parsed/WAI1/ctd/WAI1_2019_avg_sv_Ch_Mi_fit_sparse
Gathering travel_times at /wd4/GPSA_PROCESSING/New_Zealand/2024_parsed/WAI1/**/WG_*/pxp_tt
Gathering gps_solution at /wd4/GPSA_PROCESSING/New_Zealand/2024_parsed/WAI1/**/WG_*/POS_FREED_TRANS_TWTT
Load sound speed profile data...
Computing harmonic mean...
lat=-41.511926289 lon=176.456963719 height=-1937.0423 internal_delay=0.2 sv_mean=1491.367 pxp_id='WAI1-1' azimuth=-86.91 elevation=40.4
lat=-41.487291847 lon=176.47103478 height=-1957.3254 internal_delay=0.32 sv_mean=1491.368 pxp_id='WAI1-2' azimuth=40.75 elevation=40.43
lat=-41.48869747 lon=176.439294845 height=-1966.1829 internal_delay=0.44 sv_mean=1491.369 pxp_id='WAI1-3' azimuth=146.06 elevation=40.36
Finished computing harmonic mean
Load deletions data...
Load quality controls data...
Load travel times...
Traceback (most recent call last):

  File "/home/jdesanto/miniconda3/envs/gipsyx/bin/gnatss", line 8, in <module>
    sys.exit(app())

  File "/home/jdesanto/miniconda3/envs/gipsyx/lib/python3.10/site-packages/gnatss/cli.py", line 83, in run
    _, _, resdf, dist_center_df, process_ds, outliers_df = main(

  File "/home/jdesanto/miniconda3/envs/gipsyx/lib/python3.10/site-packages/gnatss/main.py", line 823, in main
    all_observations = load_data(all_files_dict, config)

  File "/home/jdesanto/miniconda3/envs/gipsyx/lib/python3.10/site-packages/gnatss/main.py", line 533, in load_data
    all_travel_times = load_travel_times(

  File "/home/jdesanto/miniconda3/envs/gipsyx/lib/python3.10/site-packages/gnatss/loaders.py", line 138, in load_travel_times
    all_travel_times = pd.concat(travel_times).reset_index(drop=True)

  File "/home/jdesanto/miniconda3/envs/gipsyx/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 372, in concat
    op = _Concatenator(

  File "/home/jdesanto/miniconda3/envs/gipsyx/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 429, in __init__
    raise ValueError("No objects to concatenate")

ValueError: No objects to concatenate

(gipsyx) [jdesanto@geod-proc01-prd-a056 GNATSS_center]$ ls /wd4/GPSA_PROCESSING/New_Zealand/2024_parsed/WAI1/**/WG_*/pxp_tt
/wd4/GPSA_PROCESSING/New_Zealand/2024_parsed/WAI1/021_copy/WG_20240121_parsed/pxp_tt
/wd4/GPSA_PROCESSING/New_Zealand/2024_parsed/WAI1/021/WG_20240121_parsed/pxp_tt

johnbdesanto commented 5 months ago

Just tried again on an install of gnatss on a fresh conda environment and got the same result.

johnbdesanto commented 5 months ago

I've run some more sanity checks, so here is some more context.

Critically, the test example data set processes without issue with GNATSS, which is nice since I haven't changed anything in that data set. However, the pxp_tt files appear to have the same format between the two cases aside from the month invoked.

This is read:

(gipsyx) [jdesanto@geod-proc01-prd-a056 GNATSS_center]$ head /wd4/GPSA_PROCESSING/Cascadia/JdF_22/NDP1/**/W*/pxp_tt
==> /wd4/GPSA_PROCESSING/Cascadia/JdF_22/NDP1/211/WG_20220730/pxp_tt <==
30-JUL-22 20:09:28.00   3676654   3569176   3642335         0
30-JUL-22 20:09:43.00   3672727   3574996   3640966         0
30-JUL-22 20:09:58.00   3669269   3578469   3640502         0
30-JUL-22 20:10:13.00   3662624   3583111   3640559         0
30-JUL-22 20:10:28.00   3655393   3589334   3641537         0
30-JUL-22 20:10:43.00   3649251   3594605   3641950         0
30-JUL-22 20:10:58.00   3639049   3601179   3644003         0
30-JUL-22 20:11:13.00   3632993   3605889   3646246         0
30-JUL-22 20:11:28.00   3626237   3609254   3648440         0
30-JUL-22 20:11:43.00   3618211   3613378   3651220         0

But this is not read:


(gipsyx) [jdesanto@geod-proc01-prd-a056 GNATSS_center]$ head /wd4/GPSA_PROCESSING/New_Zealand/2024_parsed/WAI1/**/WG_*/pxp_tt
==> /wd4/GPSA_PROCESSING/New_Zealand/2024_parsed/WAI1/021_copy/WG_20240121_parsed/pxp_tt <==
21-JAN-24 00:03:49.00   3616996   3759950   3890078         0
21-JAN-24 00:05:49.00   3566577   3803138   3909634         0
21-JAN-24 00:07:49.00   3610398   3746232   3911031         0
21-JAN-24 00:09:49.00   3587139   3804094   3883205         0
21-JAN-24 00:11:49.00   3584534   3762956   3926876         0
21-JAN-24 00:13:49.00   3619104   3769596   3878190         0
21-JAN-24 00:15:49.00   3542612   3806085   3936849         0
21-JAN-24 00:18:04.00   3605309   3748544   3915035         0
21-JAN-24 00:20:04.00   3608031   3780062   3880607         0
21-JAN-24 00:22:04.00   3598170   3752593   3919704         0

One thing to note is the time sampling of the second data set which is not working, it is sampled at 2 minutes since the data are being telemetered to shore in real time. As a result, there are only ~690 epochs instead of the ~17000 epochs in the test example. Could this make a difference?

A further test I did was to create a copy of the data so that there would be input files pulled from more than one directory, since this was a previous bug we had found.

I also put the POS_FREED_TRANS_TWTT files in a separate folder, which did nothing.

lsetiawan commented 5 months ago

Hi @johnbdesanto, thanks for opening this issue. I think, I've found what's going on here and I will need your thoughts on how to move forward.

From what I see, these pxp_tt files are in a folder that contained parsed in its name. Currently this folder is ignored since the sample data that I had, had both parsed folder and non parsed folder for the pxp_tt.

Could you please explain what the differences between the two are?
If a user has both folders, are the data similar or different, and how would one distinguish b/w the two?

The specific code that ignores this directory is as follows:

https://github.com/seafloor-geodesy/gnatss/blob/60d6ff41130b4a7e1ba9273e4b703ed21dcf0926/src/gnatss/loaders.py#L125-L137

johnbdesanto commented 5 months ago

Hi Don,

The "parsed" data are packets of data telemetered to shore in real time. Since this is done through Iridium satellite link, we don't have the bandwidth to transfer the entire data set. Instead, we send over one ping every ~2 minutes, enough that we can process a preliminary solution for qc purposes. It is worth noting that since we are missing the majority of the data set that the preprocessing is somewhat different than with the full data set, but that isn't something we need to worry about at this time.

The important thing is that we are able to run real-time qc to verify that we are collecting good data while running surveys. In the past, we have had the parsed and full-rate data in the same directory structure as in the example you built the code off of. We could get away with this because we hard-coded which input file to use in the final inversion.

I recall we had a conversation some months ago in which you asked if we could start running the parsed qc processing and full-rate processing in different directory trees so that you could implement the wildcards in the config.yaml file without causing the code to conflate the two data sets. That is what I am now doing, and in future instruction for workshops I plan on recommending other users do the same.

All of this is a long-winded way of saying that I do not believe that an additional flag to ignore data in "parsed" directories is necessary. Handling this data is important to our operations, even if only temporarily, and I believe that the code has reached a point where it can do so, hence why I was trying with the data we are currently collecting in New Zealand.

lsetiawan commented 5 months ago

Thanks for that extensive explanation. I've opened up a PR #215 to fix this bug. Once this is merged, you should be able to specify the path to pxp_tt 'parsed' directory like /path/to/parsed/**/pxp_tt

johnbdesanto commented 5 months ago

Just processed the data that had failed before using the latest 0.1.1 update of GNATSS. It ran successfully, so we can consider the bug fixed and close this issue.

seafloor-geodesy / gnatss

i/o not recognizing travel time files #214