deprecate write_emtf_z_file from transfer_function_collection

kkappler commented 2 years ago

Use the method in mt_metadata.tf instead.

-The code in mt_metadata was taken from transfer_function_collection so the outputs should agree. -The code shouldn't be in two places. -The ability to write emtf z-files has application even if the TF wasn't generated by the aurora pipeline

Therefore the place to remove the code is in transfer_function_collection and not in mt_metadata.

There are only 3 references to this method:

aurora/sandbox/io_helpers/garys_matlab_zfiles/matlab_z_file_reader.py:127:tfc.write_emtf_z_file(z_file_path, orientation_strs=orientation_strs)
aurora/transfer_function/transfer_function_collection.py:244:    def write_emtf_z_file(self, z_file_path, run_obj=None, orientation_strs=None):
aurora/pipelines/process_mth5.py:352:        tf_collection.write_emtf_z_file(z_file_path, run_obj=local_run_obj)

In the one-off matlab z-file reader
The function definition
process_mth5 calls this method, and that is where we need to cast to tf before the call to write.

kkappler commented 2 years ago

In doing this there are several other tasks to address. Now that the TF class is behaving itself we should:

[x] Change the default return_collection argument in process_mth5 to be False, not True
- [x] remove all assignments or return_collection=False except in function definitions
- [x] find all instances of process_mth5 where return_collection is True, and explicitly state

It made sense to address issue #134 at this point, so that is done as well

[x] Remove dependency on homebrew Interval class replace with pd.Interval

kkappler commented 2 years ago

@kujaku11 : I made good progress on this but I will need a quick tour of mt_metadata/transfer_functions/io, in particular the file mt_metadata/mt_metadata/transfer_functions/io/zfiles/zmm.py.

We may need to blend in some code from the write_emtf_z_file method in aurora/transfer_function/transfer_function_collection.py.

Currently .zss, .zrr are not supported in mt_metadata.

Executing: tf_cls.write_tf_file("testzss.zss", file_type="zss") or tf_cls.write_tf_file("testzss.zss", file_type="zmm") yields: KeyError: "Writer for the file type zss does not currently exist. Current writers ['edi', 'zmm', 'j', 'emtfxml', 'avg']"

Also, I noticed that when I tested I got a complaint that the mth5 file was closed, but I'm not sure why access to the mth5 should be needed for a populated tf to write to file, it is not needed for .xml output.

To test I dropped a breakpoint at the end of process_mth5 method, right before returning tf_cls, in process_mth5.py

I'll create an issue about the plugins keys in mt_metadata

kkappler commented 1 year ago

Revisiting this again; Current status: all files {.zss, .zrr, .zmm} are capable of being written. The files do not contain data however. Attached is an example zrr file (saved with .txt extension for uploadability) synthetic_test1.txt

I created a python executable from aurora/tutorials/synthetic_data_processing.ipynb and added a line to save the tf as .zrr. I put it in aurora/tests/io/synthetic_data_processing.py.

Following the logic, at line 214 of mt_metadata/transfer_functions/io/readwrite.py, the object file_writer is the following function: <function mt_metadata.transfer_functions.io.zfiles.zmm.write_zmm(tf_object, fn=None)>

Stepping into that function lands us (as expected) in: mt_metadata.transfer_functions.io.zfiles.zmm.write_zmm.py at line 1011.

A ZMM object is created from the tf dataset, and some conventions regarding the channel ordering and nomenclatures are regarded in lines 1019-1031. Then, in lines 1032 - 1055 some wrangling of run information is handled. It is not clear to me what is happening there, maybe add a comment about the purpose of that code block?

Finally, on line 1059 we have zmm_obj.write(fn)

Here are some things that should be sorted out:

ZMM has an attr: self.decimation_dict = {}, A KeyError exception is handled when the decimation_dict is referenced. In order to fix this, one would need to know what the intended structure of decimation_dict is? The attr appears to be populated in the method _read_period_block around line 604.
```
self.decimation_dict[f"{period:.10g}"] = {
        "level": level,
        "bands": bands,
        "npts": npts,
        "df": sr,
    }
```
So it is keyed by a string representation of the period, and has info about that periods decimation level, band, number of points and sample rate. When I inspect in the debugger: self.decimation_dict I get an empty {}. So this is not being populated correctly.
The header produced is:

TRANSFER FUNCTIONS IN MEASUREMENT COORDINATES ***** WITH FULL ERROR COVARIANCE ****

test1 coordinate 0.000 0.000 declination 0.00 number of channels 5 number of frequencies 25 orientations and tilts of each channel

But the expected header is:

IMPEDANCE IN MEASUREMENT COORDINATES ** WITH FULL ERROR COVARINCE** Robust Remote Reference
station :test2
coordinate 0.0 0.0 declination 0.0 number of channels 5 number of frequencies 25 orientations and tilts of each channel 1 0.00 0.00 tes Hx 2 90.00 0.00 tes Hy 3 0.00 0.00 tes Ex 4 90.00 0.00 tes Ey 5 0.00 0.00 tes Hz

Writing TF values Finally, when we reach the block of code that writes the TF values (around line 557-586), there are executed a number of double for-loops over self.output_channels and self.input_channels, however, these attributes both yield empty lists, so they are not being set correctly.

The input and output channels are taken from self.channels_recorded, which is also empty. This should not be the case.

The problem may not lie in zmm.py, since the tf_object.station_metadata.channels_recorded is also an empty list.

The ZMM init code (around lines 1015-1017) is:

zmm_obj = ZMM()
zmm_obj.dataset = tf_object.dataset
zmm_obj.station_metadata = tf_object.station_metadata

And at that point, we are supposed to be able to tell if there is a tipper or not, i.e. channels recorded should be defined.

So to debug, we can try getting tf_object.station_metadata.channels_recorded set to its proper values and see if that helps. One cannot hackaround such a test however by setting: tf_object.station_metadata.channels_recorded = ["ex", "ey", "hx", "hy", "hz"] as tf_object.station_metadata is an object of type mt_metadata.transfer_functions.tf.station.Station which does not support explicit assignment. This is fine and probably for the best because it will force me to make sure that the TF object has the proper assignments first.

So, ...

After all that, it looks like this issue is blocked by: mt_metadata.transfer_functions.tf.station.Station does not have the correct channels_recorded information. This means that the parent data structure which is a mt_metadata.transfer_functions.core.TF was not properly assigned it's values (it is depraved!) and so we should look there before continuing on the details of the ZMM class.

The assignment of the mt_metadata.transfer_functions.core.TF object is done in aurora, specifically in the function export_tf in aurora/pipelines/process_mth5.py

Chasing this down a little further, the export_tf() method takes a kwarg called station_metadata_dict. In the specific case I am debugging (synthetic test data) station_metadata_dict looks like this:

{'station': OrderedDict([('acquired_by.name', None), ('channels_recorded', ['ex', 'ey', 'hx', 'hy', 'hz']), ('data_type', 'BBMT'), ('geographic_name', None), ('hdf5_reference', ''), ('id', 'test1'), ('location.declination.model', 'WMM'), ('location.declination.value', 0.0), ('location.elevation', 0.0), ('location.latitude', 0.0), ('location.longitude', 0.0), ('mth5_type', 'Station'), ('orientation.method', None), ('orientation.reference_frame', 'geographic'), ('provenance.creation_time', '1980-01-01T00:00:00+00:00'), ('provenance.software.author', None), ('provenance.software.name', None), ('provenance.software.version', None), ('provenance.submitter.email', None), ('provenance.submitter.organization', None), ('release_license', 'CC0-1.0'), ('run_list', ['001']), ('time_period.end', '1980-01-01T00:00:00+00:00'), ('time_period.start', '1980-01-01T00:00:00+00:00')])}

Note that channels_recorded is a non-empty list, having the expected values. So the issue is that the channel names, which are available are not getting assigned, which points squarely at line 181: tf_cls.station_metadata.from_dict(station_metadata_dict)

That takes us into the assignment loop around line 645 of mt_metadata/base/metadata.py in the class Base(), into the core! N.B. This loop successfully assigns all the string variables, such as ('id', 'test1'), so perhaps the issue is that the value in the key-value pair ('channels_recorded', ['ex', 'ey', 'hx', 'hy', 'hz']), is a list? No, that is not it ...

Debugging further from here would benefit from some help from @kujaku11. Essentially, I land in mt_metadata/transfer_functions/tf/station.py at the

 @channels_recorded.setter
    def channels_recorded(self, value):
        if not isinstance(value, (list, tuple)):
            raise ValueError("Input channels must be a list")

        self._recorded_channels = value

method. I cannot seem to assign any value at all to self._recorded_channels, and the getter method above this for channels_recorded does not seem to care what gets set here anyhow, it loops over runs and rebuilds the _recorded_channels variable at each call.

Two questions then:

How can I set self._recorded_channels? ... when I follow the assignment in the debugger I ultimately end up in line 405 of the base class executing super().__setattr__(name, value), which seems to have no effect.
Is there not some inconsistency in the getter/setter relationship here anyhow that should be sorted out?

kkappler commented 1 year ago

After a debugging session on 26 May, the zmm writer seems to be working on the fcs branch of mt_metadata

kkappler commented 1 year ago

Now working on fc branch of mth5. the test is in aurora/tests/io/test_issue_139.py Before closing:

[x] Make sure that the azimuths are being output correctly to the header
[x] Need to transmit decimation level information to the mt_metadata TF
[X] factor out the bottom 25 lines or so of process_mth5 which is concerned with generating the tf_class from tf_collection
[X] Replace all calls to write_emtf_z_file with the new method
- [X] aurora/sandbox/io_helpers/garys_matlab_zfiles/matlab_z_file_reader.py
- [X] aurora/pipelines/process_mth5.py
[ ] See if the agreement in data between z-file and tf_xml can be make 1e-5 (its currently 1e-4)

kkappler commented 1 year ago

Information about decimation levels is lost in aurora's _merge_decimation_levels method in transfer_function_collection.py

The processs_mth5 flow operates over each decimation level iteratively, packing a variant of Egbert's TTFZ transfer function class into a dictionary, keyed by decimation level

tf_dict[0] --> TTFZ with decimation level 0 info tf_dict[1] --> TTFZ with decimation level 1 info ... tf_dict[N] --> TTFZ with decimation level N info

The tf_dict is passed toTransferFunctionCollection` a convenience class in aurora with methods for merging the dict and exporting.

I tried to pack the merged xarray with decimation level info, but the final data product is sparsely populated because most frequencies exist at only one decimation level. I don't think we want this sparse, bloated xarray floating around. For posterity, the code that I tried to use to make the multi-indexed data structure in tranfer_function_collection _merge_decimation_levels method is pasted below:


tmp_df = [self.tf_dict[i].tf.to_dataframe("transfer_function") for i in range(n_dec)]
tmp_df = [pd.concat([tmp_df[i]], keys=[f"{i}"], names=["decimation_level"]) for i in range(n_dec)]
# To recover the xarray from tmp_df[i], do this:
#tmp_df[0].to_xarray().transfer_function.drop("decimation_level")
tmp_df = pd.concat(tmp_df)
# But if we cast this back to an xarray it is very sparse, having dimension (4,3,2,25) in this case.

The solution I went with was instead to pass `decimation_dict` to the TF object.  This makes test_issue_139.py output a correctly formatted z-file with the right metadata in each block from the TF.

kkappler commented 1 year ago

fourier_coefficients branch is not not calling write_emtf_z_file, instead it uses mt_metadata's TF class for writing.

The leading-zero scientific notation was added as a PR to mt_metadata (branch fortran_format)

simpeg / aurora

deprecate write_emtf_z_file from transfer_function_collection #139