CAS04 Dataset reconciliation

kkappler commented 3 years ago

The CAS04 dataset processes with aurora but differs from the archived results in SPUD by an factor of ~1e-12. Specifically, the aurora TFs are much smaller than those on SPUD.

A plausible source of error here is if the Electric field data were interpretted as mV/km , but were actually V/m, then the E field spectra would be too large by 1e-6. Because apparent resistivity is proportional to 1/|E|^2 this would make the TF differ by 1e-12.

The data were generated by this python file https://github.com/kujaku11/mth5/blob/master/examples/make_mth5_from_iris_dmc_local.py

And the results pulled from the web were accessed here: http://ds.iris.edu/spud/emtf/18633652

ToDo: Review the metadata units for all stages of the filters. This is a NIMS long period system.

kkappler commented 3 years ago

created a script called issue_31.py in aurora/tests/earthscope/cas04/ which dumps the metadata as it provided in the mth5 that is made by: mth5/examples/make_mth5_from_iris_dmc.py

START SUMMARY FOR CHANNEL ex
Total number of filters = 6

FILTER# 0 electric field 5 pole butterworth low-pass mV/km-->mV/km, Gain:1.0
FILTER# 1 electric field 1 pole butterworth high-pass mV/km-->mV/km, Gain:1.0
FILTER# 2 mv per km to v per m mV/km-->V/m, Gain:1e-06
FILTER# 3 v per m to v V/m-->V, Gain:92.0
FILTER# 4 v to counts (electric) V-->count, Gain:484733700000000.0
FILTER# 5 electric time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL ex

<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL ey
Total number of filters = 6

FILTER# 0 electric field 5 pole butterworth low-pass mV/km-->mV/km, Gain:1.0
FILTER# 1 electric field 1 pole butterworth high-pass mV/km-->mV/km, Gain:1.0
FILTER# 2 mv per km to v per m mV/km-->V/m, Gain:1e-06
FILTER# 3 v per m to v V/m-->V, Gain:92.0
FILTER# 4 v to counts (electric) V-->count, Gain:484733700000000.0
FILTER# 5 electric time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL ey

<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL hx
Total number of filters = 3

FILTER# 0 magnetic field 3 pole butterworth low-pass nT-->V, Gain:1.0
FILTER# 1 v to counts (magnetic) V-->count, Gain:100.0
FILTER# 2 hx time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL hx

<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL hy
Total number of filters = 3

FILTER# 0 magnetic field 3 pole butterworth low-pass nT-->V, Gain:1.0
FILTER# 1 v to counts (magnetic) V-->count, Gain:100.0
FILTER# 2 hy time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL hy

<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL hz
Total number of filters = 3

FILTER# 0 magnetic field 3 pole butterworth low-pass nT-->V, Gain:1.0
FILTER# 1 v to counts (magnetic) V-->count, Gain:100.0
FILTER# 2 hz time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL hz

This is a little bit confusing. The scale factors may be correct in their product but they are not correct in their individual values.

Requests and recommendations:

When the XML are created, order the filter stages along the data flow.
Transducer --> Analog Filters --> A2D Converter --> digitial filters.

This would change, for example ex to

FILTER# 0 [was 2]          mV/km --> V/m                    Gain:1e-06
FILTER# 1 [was 3]            V/m --> V                      Gain:92.0  
FILTER# 2 [was 0]              V --> V                      Gain:1.0
FILTER# 3 [was 1]              V --> V                      Gain:1.0
FILTER# 4 [was 4]              V --> count                  Gain:484733700000000.0
FILTER# 5 [was 5]          count --> count                  Gain:1.0

from:

FILTER# 0 electric field 5 pole butterworth low-pass     mV/km --> mV/km,    Gain:1.0
FILTER# 1 electric field 1 pole butterworth high-pass    mV/km --> mV/km,    Gain:1.0
FILTER# 2 mv per km to v per m                           mV/km --> V/m,      Gain:1e-06
FILTER# 3 v per m to v                                     V/m --> V,        Gain:92.0
FILTER# 4 v to counts (electric)                             V --> count,    Gain:484733700000000.0
FILTER# 5 electric time offset                           count --> count,    Gain:1.0

and hx is ok how it is:

FILTER# 0 magnetic field 3 pole butterworth low-pass         nT --> V,     Gain:1.0
FILTER# 1 v to counts (magnetic)                              V --> count, Gain:100.0
FILTER# 2 hx time offset                                  count --> count, Gain:1.0

Use SI units in the metadata, let the units be converted in the processing
Counts to volts for electric are not realistic, possibly this system is already reporting in mV/km? and the additional 1e-6 is why I have a scale factor error?
Counts to volts for electric and magnetic should nominally be the same

kkappler commented 3 years ago

This requires a review with the team, and relates to issue #36

The way that the scale factors are setup in SEED is that the data in counts are divided by the product of the stages. Compare the CAS04 metadata with that of SAO

Ctrl+F LQ2

In that example the total sensitivity is -9.96867550678789E8 ~= 1e9

This station uses the dipole length (236m) as an analog gain in stage 1, together with the PZ response of the EFSC front end analog filter. Stage 2 is the 10x gain on the EFSC. Stage 3 is volts to counts, with a gain of 427135. i.e. ~4e5 counts per volt.

Working back from data in counts we have the following logic:

Say you read 400000 counts. This corresponds to 1 "Gained Volt" after the amplifier, which is due to 0.1 "True Volt" before the amplifier which is around 0.0004 V/m, which is ~420mV/km. A plausible field to observe at SAO during a major storm.

In the case of CAS04 the counts to Volts is ~1.72 x 2^48, whereas at SAO it is around 2^19. It doesn't seem plausible that there are enough bits on the digitizer to allow for a 2^48 counts per volt.

It would be helpful to have the config files from the SPUD data processing run. Are these available?

kkappler commented 3 years ago

@akelbert I am going to put a summary of what we discussed this morning and a request for follow up info in later this morning -- testing comm

akelbert commented 3 years ago

@kkappler I'm attaching the configuration files that we used for the RR CAS04 processing (NASA-MTA20 survey). Also attaching the four CAS04 SP files with an added *.txt extension - otherwise GitHub won't attach.

CF_for_sharing.zip CAS04a.sp.txt CAS04b.sp.txt CAS04c.sp.txt CAS04d.sp.txt

kkappler commented 3 years ago

Thanks @akelbert - these are a great start and I can use these to check the units. Are there the other processing parameters by any chance? Specifically:

pwset.cfg #prewhitening settings
band_setup.cfg # frequency bands
decset.cfg # decimation specification
options.cfg or options_RR.cfg # has setting about RR station, coherence sorting parameters etc.
tranmt.cfg (or similar) # config file pointing to options, input and output filessystem

I think these are all of them, but I don't see where the apodization window is defined ... maybe I am missing a file? In any case, it would be ideal if we could group together all the settings used in the processing that generated the SPUD data.

akelbert commented 3 years ago

Hi @kkappler, these settings are all in the zip archive above. The *.cfg files cannot be attached to issues in GitHub, extension not recognized. The zip has them all.

akelbert commented 3 years ago

@kkappler on "counts only in IRIS", see the FDSN SEED manual, http://www.fdsn.org/pdf/SEEDManual_V2.4.pdf Blockette 1000 (miniSEED - Data Only) on p. 123. As far as I'm aware, that's the unique data archiving format allowed at IRIS.

kkappler commented 3 years ago

@akelbert The zip archive is empty. Maybe because the files inside need to be "tricked" into being .txt as well?

akelbert commented 3 years ago

@kkappler trying again; if that still doesn't work I'll rename them all to txt files.

CF_for_sharing.zip

kkappler commented 3 years ago

I am going to add a few comments to describe the differences I see between the mt_metadata and the files provided by Anna. Here is a description of the metadata that I am interacting with sourced from the current station_xml.

I am using data that is created by the example in mth5 here:

once that file is created, filling in filepath with your local path in the example code below allows interaction with the filters.

from pathlib import Path
from mth5 import mth5

filepath = Path("path/to/file.h5")
mth5_obj = mth5.MTH5()
mth5_obj.open_mth5(filepath, mode="r")
cas04 = mth5_obj.stations_group.get_station("CAS04")
run_a=cas04.get_run('a')
ex = run_a.get_channel("ex")
ex.channel_response_filter.filters_list

makes the following output

[{
     "pole_zero_filter": {
         "calibration_date": "1980-01-01",
         "comments": "butterworth filter",
         "gain": 1.0,
         "name": "electric field 5 pole butterworth low-pass",
         "normalization_factor": 313383.601119193,
         "poles": [
             "(-3.883009+11.951875j)",
             "(-3.883009-11.951875j)",
             "(-10.166194+7.386513j)",
             "(-10.166194-7.386513j)",
             "(-12.566371+0j)"
         ],
         "type": "zpk",
         "units_in": "mV/km",
         "units_out": "mV/km",
         "zeros": []
     }
 },
 {
     "pole_zero_filter": {
         "calibration_date": "1980-01-01",
         "comments": "butterworth filter",
         "gain": 1.0,
         "name": "electric field 1 pole butterworth high-pass",
         "normalization_factor": 1.00000351809047,
         "poles": [
             "(-0.000167+0j)"
         ],
         "type": "zpk",
         "units_in": "mV/km",
         "units_out": "mV/km",
         "zeros": [
             "0j"
         ]
     }
 },
 {
     "pole_zero_filter": {
         "calibration_date": "1980-01-01",
         "comments": "unit conversion",
         "gain": 1e-06,
         "name": "mv per km to v per m",
         "normalization_factor": 1.00000351809047,
         "poles": [],
         "type": "zpk",
         "units_in": "mV/km",
         "units_out": "V/m",
         "zeros": []
     }
 },
 {
     "pole_zero_filter": {
         "calibration_date": "1980-01-01",
         "comments": "electric dipole",
         "gain": 94.0,
         "name": "v per m to v",
         "normalization_factor": 1.00000351809047,
         "poles": [],
         "type": "zpk",
         "units_in": "V/m",
         "units_out": "V",
         "zeros": []
     }
 },
 {
     "coefficient_filter": {
         "calibration_date": "1980-01-01",
         "comments": "analog to digital conversion",
         "gain": 484733700000000.0,
         "name": "v to counts (electric)",
         "type": "coefficient",
         "units_in": "V",
         "units_out": "count"
     }
 },
 {
     "time_delay_filter": {
         "calibration_date": "1980-01-01",
         "comments": "time correction",
         "delay": -0.285,
         "gain": 1.0,
         "name": "electric time offset",
         "type": "time delay",
         "units_in": "count",
         "units_out": "count"
     }
 }]

akelbert commented 3 years ago

@kkappler @kujaku11 so I'm still confused on our resolution on the gain factor question. I combed through my USGS email over the years, and at least since 2015 there was no mention of the factor 4.847337E14. Prior to the latest archiving where this factor was introduced as the Volts to counts conversion, the gain for this conversion was always set to 4.09601E8 (makes perfect sense to me). This change followed a lengthy discussion (I believe) a suggestion from Tim Ronan - better check with him! After looking through all the documents, I am inclined to go back to the 4.09601E8 for this re-archiving. Also, prior to the year 2020, we've always archived the electrics in V/m, e.g., http://service.iris.edu/fdsnws/station/1/query?net=EM&sta=IDA11&cha=*&level=response. Now, we've made the change to mV/km, as in http://service.iris.edu/fdsnws/station/1/query?net=ZU&sta=KSR28&cha=*&level=response. This, too, followed a lengthy discussion with everyone, including Gary, why said that the only reason this wasn't done in the past was that IRIS didn't support non-SI units at the time this archiving was initially setup.

The only info I could find in the archiving docs follows below (also, attached):

Gain for stage 1 for magnetic channels (Butterworth 3 poles low-pass) is calculated as 1/hfac1e9, where hfac is taken from .hed file, at zero frequency (conversion from Tesla to digital counts). Typical value of hfac is 0.01, then gain will be 1e11, as shown in the table below. Sensitivity is a product of gains over entire channel at zero frequency, that is also 1/hfac109. Gain for stage 1 for electric channels is equal to the dipole length (m), coming from .hed files (100m in the example below) and corrected for normalization frequency 0.01 (i.e. multiplied by A0/A1, where A1 calculated at frequency=0.01Hz). Gain for stage 2 for magnetic channels (low-pass) is calculated as 1/efac103, where efac is taken from .hed file, and multiplied by A0/A1, where A1 calculated at frequency=0.01Hz (conversion from Volts to digital counts). For example, typical value for MT1 NIMS efac is 2.441406e-6, thus typical value for gain will be 4.0960e+08, multiplied by A0/A1 (~1). Sensitivity is again calculated as a product of gains over entire channel at 0.01Hz frequency.

OSU_Data_Archiving_2006.pdf

At this point, for my re-archiving I'd like to switch back to that original gain factor which makes sense to me - the new one doesn't. Would that fix the discrepancy that you're seeing? I also suggest that we stick with the practical units, but OK with switching back to SI units if that's for some reason easier. OK with moving the dipole length multiplication upfront to make it Stage 1. At any rate, let us set up a telecon between the three of us, at least, but better - with Gary, and finalize these details so that I can complete the archiving and move on. I really have to at this time.

kkappler commented 2 years ago

Updated XML no longer have very large count to volt conversion. Still need to check that dipole length is correct (it should be 92).

Testing against the archived result on SPUD is a bit complicated by the fact that the SPUD results are .zmm (not .zrr).

Anna will run a .zrr and share

The .zrr along with the processing configuration from EMTF,
The updated station_xml with 92m dipole

akelbert commented 2 years ago

@kkappler here are three decent remote reference options for CAS04 and the corrected/updated StationXML for this site. Please check against Aurora!

CF_for_sharing.zip . CAS04_stationxml.xml.zip CAS04_zrr.zip

kkappler commented 2 years ago

@akelbert is it possible to generate .zrr for non-merged runs... specifically these examples all use b,c,d together. Can you please create one with only run b (or only run c or only run d or all of the above?) It isn't important right now which of them, just as long as I know which one it is. This will allow me to run the tests without merging runs which is still being built out.

kkappler commented 2 years ago

@akelbert : Also, can I get the station.xml for the remote reference station as well? That way I can pull the reference data based on the listed runs available in the reference station ... I suppose that if you just ran single station processing on b,c, or d we could circumvent this for now

timronan commented 2 years ago

@akelbert CAS04_stationxml.xml.zip does not pass validation.

211,Error,8P,CAS04,,,2020-06-02T18:41:43,2020-07-13T21:46:12,Chan: LQE Loc: 2020-07-01T19:36:55 2020-07-13T21:46:12 epoch overlaps with LQE 2020-07-01T19:36:55 2020-07-13T21:46:12

8P.CAS04.LQE 2020-06-02T18:41:43 2020-07-13T21:46:12 is duplicated in CAS04_stationxml.xml.zip . The attached zip file has the updated stationxml file attached.

Also, this stationxml has a different network code than the CAS04 station in the IRIS archive. The archived version of CAS04 is ZU.CAS04 while the stationxml is labeled as 8P.CAS04. This leads to problems when trying extracting time series from the the IRIS archive unless this Network naming conflict is manually resolved.

The attached metadata does not resolve this Network name conflict because I think that ZU may be renamed to 8P but it is important to note this Network naming conflict so we don't accidentally trip over this inconsistency in the future.

CAS04_stationxml_updated.xml.zip

Karl says: This file is corrupt but a verison of it is committed in aurora/tests/cas04/cas04_from_tim_20211203.xml

kkappler commented 2 years ago

@kujaku11 : Revisiting this dataset in the context of merging runs. The currently archived version is missing filters for electric channels. In particular:

channel ex has identical filters for runs a,b,c but no filters at all for run d.
channel ey has identical filters for runs a,b but no filters at all for runs c,d.
channels hx, hy have all identical filters for all runs a,b,c,d and pass a cursory inspection

If you want to build the mth5 locally to inspect it, this can be done using the new operate_aurora.ipynb in docs/notebooks on main branch. Can you PTAL and see if an updated metadata file with consistent channels can be passed to Tim or Laura for archiving?

In a pinch I will try just overwriting the electric field filters from run "a" onto "c" and "d" in operate_aurora.ipynb and see if I can workaround that way.

kkappler commented 2 years ago

The specific issue with the metadata (cited above on June 6, 2022) is still under investigation by @laura-iris , and she will be looking at the make_mth5 logic this month. In the meantime, a variation to make_mth5 has been applied on the fix_issue_105 branch of mth5. Details are at https://github.com/kujaku11/mth5/issues/105 & https://github.com/kujaku11/mth5/pull/106.

With this fix in place I am now able to generate mth5 files with all the stations ["CAS04", "CAV07", "NVR11", "REV06"] and am starting to get remote reference processed results. I will report these results on this issue as they are generated.

kkappler commented 2 years ago

Pasted below are comparisons of the z-files returned by aurora and emtf. They agree reasonably well at low frequencies. At high frequencies, we know that coherence sorting was used by emtf, and this feature is not yet available in aurora.

Note that NVR11 was processed using a different band setup file due to issue #196

CAS04_RRCAV07 CAS04_RRNVR11 CAS04_RRREV06