simpeg / aurora

software for processing natural source electromagnetic data
MIT License
14 stars 2 forks source link

Handling of multiple acquisition runs as a single processing run #80

Closed kkappler closed 2 years ago

kkappler commented 3 years ago

Use Cases:

  1. Several long period runs, possibly broken up by a power outage for example
  2. Regular High Frequency, short-duration acquistions, e.g. ZEN

Case 2 can be handled by breaking process_mth5_decimation_level into:

stft_agg = []

    #for run in runlist:
        # stft_obj = make_stft_decimation_level()
        #stft_agg.append(stft_obj)
    # tf_obj = process_stft_decimation_level(stft_agg)
kkappler commented 2 years ago

ProcessingRunTS class could be used to manage, or at least reference the data This acts to Merge runs/ processing mixed runs

Run merging requirements -must handle an arbitrary number of runs -must handle decimation

Nan filling is a general solution with two potential complications A big gaps could overload ram B filtering edge effects

kkappler commented 2 years ago

Ideally we want a model that is completely general (for disjoint time series). The TSCollection is associated with a list of time intervals ℐ0 = {(a,b)_i such that all data to be processed are in the Union of (a,b)_i). The individual elements of ℐ0 are normally acquisition runs, or intervals properly contained in acquisition runs, but we want to be careful not to exclude the case of joining acquisition runs via some gap-fill technique. For example, a few long acquisition runs, with only a short gap in between may want to be processed for very long periods (longer than either acquisition run can yield alone).

A companion set of intervals, where synthetic data (interp, iawrw, etc) can be overlain on the original set. The companion set basically specifies intervals that should be infilled so that runs can be treated as continuous. This is particularly useful in the case of a few missing samples, but could have wider application.

kkappler commented 2 years ago

The place where this will be implemented in the code is the function process_mth5_run in aurora/pipelines/process_mth5.py

The current function structure is:

def process_mth5_run(
    run_cfg,
    run_id,
    units="MT",
    show_plot=False,
    z_file_path=None,
    return_collection=True,
    **kwargs,
):

To support multiple runs we could replace run_id (currently a string) with optionally with a list of strings, each specifying a run. Implementing this change does not look too complicated. The function structure would stay very similar. Instead of extracting a single run, and STFT and process, we would instead extract each run in the list, and STFT each individually. Then the STFTs would be merged together in one xarray of spectral measurements and that array would be passed to the TF estimation method.

This solution should work in general for single station processing.

For multiple station processing there is one more layer to consider here. The run labels will not in general be the same for different stations. We would need an iterable of runs for the station of interest, and also an iterable of runs for the remote reference station. The determination of which runs will be processed is currently not supported.

When there are many stations (MMT) we would need to handle many subcases. Might need another version of process_mth5_run for MMT.

kkappler commented 2 years ago

This is done in frequency domain. Issue #152 is still open about doing this in time domain.