kujaku11 / mth5

Exchangeable and archivable format for magnetotelluric time series to better serve the community through FAIR principles.
https://mth5.readthedocs.io/en/latest/index.html
MIT License
16 stars 4 forks source link

Extraction of a subset to another mth5 archive (export_subset) #219

Open kkappler opened 3 months ago

kkappler commented 3 months ago

Context: sometimes my h5 files can get large and I would like to export a subset of the data in the MTH5 to a smaller file that I can share to another user, or transfer over network to another machine. For example, I have a 20GB file, but a few short runs <1GB that I want to share.

It would be nice if I could do something like:

m.open_mth5(file_name)
m.export_subset(other_file_name, dataset_df)
m.close_mth5()

Now I would have a separate mth5 called other_file_name.

It would be ideal if all the metadata cleanly transferred to the new file, and of the input dataframe looked like an mth5 channel summary. In that case, I could just grab the m.channel_summary.to_dataframe(), then apply some queries to the df, reducing it to dataset_df and call a one liner.