stft_agg = []

    #for run in runlist:
        # stft_obj = make_stft_decimation_level()
        #stft_agg.append(stft_obj)
    # tf_obj = process_stft_decimation_level(stft_agg)

kkappler commented 2 years ago

ProcessingRunTS class could be used to manage, or at least reference the data This acts to Merge runs/ processing mixed runs

Desired Elements: -The ability to slice mth5 runs based on time interval. (A workaround was used to put this functionality into VA Tech analysis but this will be needed in general for trimming remote references.)
Could be done by trimming an existing run ts or by loading with a time interval argument
method for handling changes in metadata from run to run (this should actually be handled by a new station instance). In any case, if an instrument is swapped out, we want the previous and future runs to relate to the same location. It is not clear if we would want to mix the runs with the different instruments in a single processing job ... if we did using FC class as an interface would be useful -Metadata standards changes: Generally how to handle multiple runs in the new vs the old metadata? When the time are explicitly enumerated it's pretty straight forward, but when the stream comes back broken up we need to detect and build runs on the fly or repair /mend time series

Run merging requirements -must handle an arbitrary number of runs -must handle decimation

Nan filling is a general solution with two potential complications A big gaps could overload ram B filtering edge effects

kkappler commented 2 years ago

Ideally we want a model that is completely general (for disjoint time series). The TSCollection is associated with a list of time intervals ℐ0 = {(a,b)_i such that all data to be processed are in the Union of (a,b)_i). The individual elements of ℐ0 are normally acquisition runs, or intervals properly contained in acquisition runs, but we want to be careful not to exclude the case of joining acquisition runs via some gap-fill technique. For example, a few long acquisition runs, with only a short gap in between may want to be processed for very long periods (longer than either acquisition run can yield alone).

A companion set of intervals, where synthetic data (interp, iawrw, etc) can be overlain on the original set. The companion set basically specifies intervals that should be infilled so that runs can be treated as continuous. This is particularly useful in the case of a few missing samples, but could have wider application.

kkappler commented 2 years ago

The place where this will be implemented in the code is the function process_mth5_run in aurora/pipelines/process_mth5.py

The current function structure is:

def process_mth5_run(
    run_cfg,
    run_id,
    units="MT",
    show_plot=False,
    z_file_path=None,
    return_collection=True,
    **kwargs,
):

To support multiple runs we could replace run_id (currently a string) with optionally with a list of strings, each specifying a run. Implementing this change does not look too complicated. The function structure would stay very similar. Instead of extracting a single run, and STFT and process, we would instead extract each run in the list, and STFT each individually. Then the STFTs would be merged together in one xarray of spectral measurements and that array would be passed to the TF estimation method.

This solution should work in general for single station processing.

For multiple station processing there is one more layer to consider here. The run labels will not in general be the same for different stations. We would need an iterable of runs for the station of interest, and also an iterable of runs for the remote reference station. The determination of which runs will be processed is currently not supported.

When there are many stations (MMT) we would need to handle many subcases. Might need another version of process_mth5_run for MMT.

kkappler commented 2 years ago

This is done in frequency domain. Issue #152 is still open about doing this in time domain.

simpeg / aurora

Handling of multiple acquisition runs as a single processing run #80

stft_agg = []