kujaku11 / mth5

Exchangeable and archivable format for magnetotelluric time series to better serve the community through FAIR principles.
https://mth5.readthedocs.io/en/latest/index.html
MIT License
16 stars 6 forks source link

make_mth5 from atomic list of streams #51

Open kkappler opened 3 years ago

kkappler commented 3 years ago

Here is some rough pseudocode for a Requested Function:

This is a conceptual sketch of a method that we would want to expand on in future.

There are some classes that could be used to do a lot of this.

Namely: aurora/aurora/sandbox/io_helpers/fdsn_dataset_config.py

def add_to_exiting_mth5(row, mth5_obj):
    fdsn_dataset_config  = FDSNDatasetConfig.from_df_row(row)
    inventory = fdsn_dataset_config.get_inventory()
    stream = fdsn_dataset_config.get_data_via_fdsn_client()

    #<VOODOO>
    add_metadata_to_mth5(mth5_obj, inventory)
    add_datastream_to_mth5(mth5_obj, stream)
    #</VOODOO>
    return

def make_mth5_from_dataframe(df, h5_path=None):
    """
    df: pandas Dataframe 
    has the following columns:  [“NETWORK”, “STATION”, “CHANNEL”, “START TIME”, “END TIME”]

    h5_path: pathlib.Path or string or None
        This is the path the the mth5 file that will get built by the function

    Behaviour: 
    The function can iterate over each row of the dataframe, and access the metadata
     and data associated with that row. The data and metadata will be added to the 
     mth5 object.

    This means that metadata from a new stream can be augmented to the mth5 "experiment"

    Test that the mth5 can be saved
    That the mth5 can be opened and all the data can be read back (maybe plotted as a check that everything is fine)

    After a first cut works, an obvious thing to do is merge the stream queries

    Parameters
    ----------
    df
    h5_path

    Returns
    -------

    """
    mth5_obj = initialize_mth5(h5_path) #returns an mth5_obj and handles already 
    # exists case
    for row in df.iterrows():
        add_to_exiting_mth5(row, mth5_obj)
    mth5_obj.close() #etc.
    return h5_path
kujaku11 commented 3 years ago

Will need to go through all methods for add_station, add_run, add_channel ... to test if a new channel wants to be added, if the metadata are the same.

If the metadata is not the same what to do, give the user the ability to update, overwrite, make new

what should be the default (overwrite or update)

kujaku11 commented 3 years ago

@timronan @kkappler We should also move this to mth5/clients/make_mth5

kkappler commented 3 years ago

I put a table listing data that I have worked with in the past couple of months which maybe helpful as a template.

test_mth5_examples.csv

Note that when reading the table with pd.read_csv() supplying the argument

parse_dates=['start time (UTC)', 'end time (UTC)']

will make those columns readin as datetime objects, i.e.

df = pd.read_csv(csv_filename, parse_dates=['start time (UTC)', 'end time (UTC)'])

Most of the datasets are NCEDC, but there is an IRIS example as well. It would nice if we could curate these (and others) as test cases and that way when specific issues pop up we can isolate by refering to a test_case table.