Closed kkappler closed 2 years ago
created a script called issue_31.py in aurora/tests/earthscope/cas04/ which dumps the metadata as it provided in the mth5 that is made by: mth5/examples/make_mth5_from_iris_dmc.py
START SUMMARY FOR CHANNEL ex
Total number of filters = 6
FILTER# 0 electric field 5 pole butterworth low-pass mV/km-->mV/km, Gain:1.0
FILTER# 1 electric field 1 pole butterworth high-pass mV/km-->mV/km, Gain:1.0
FILTER# 2 mv per km to v per m mV/km-->V/m, Gain:1e-06
FILTER# 3 v per m to v V/m-->V, Gain:92.0
FILTER# 4 v to counts (electric) V-->count, Gain:484733700000000.0
FILTER# 5 electric time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL ex
<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL ey
Total number of filters = 6
FILTER# 0 electric field 5 pole butterworth low-pass mV/km-->mV/km, Gain:1.0
FILTER# 1 electric field 1 pole butterworth high-pass mV/km-->mV/km, Gain:1.0
FILTER# 2 mv per km to v per m mV/km-->V/m, Gain:1e-06
FILTER# 3 v per m to v V/m-->V, Gain:92.0
FILTER# 4 v to counts (electric) V-->count, Gain:484733700000000.0
FILTER# 5 electric time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL ey
<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL hx
Total number of filters = 3
FILTER# 0 magnetic field 3 pole butterworth low-pass nT-->V, Gain:1.0
FILTER# 1 v to counts (magnetic) V-->count, Gain:100.0
FILTER# 2 hx time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL hx
<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL hy
Total number of filters = 3
FILTER# 0 magnetic field 3 pole butterworth low-pass nT-->V, Gain:1.0
FILTER# 1 v to counts (magnetic) V-->count, Gain:100.0
FILTER# 2 hy time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL hy
<class 'mt_metadata.timeseries.filters.channel_response_filter.ChannelResponseFilter'>
START SUMMARY FOR CHANNEL hz
Total number of filters = 3
FILTER# 0 magnetic field 3 pole butterworth low-pass nT-->V, Gain:1.0
FILTER# 1 v to counts (magnetic) V-->count, Gain:100.0
FILTER# 2 hz time offset count-->count, Gain:1.0
END SUMMARY FOR CHANNEL hz
This is a little bit confusing. The scale factors may be correct in their product but they are not correct in their individual values.
Requests and recommendations:
This would change, for example ex to
FILTER# 0 [was 2] mV/km --> V/m Gain:1e-06
FILTER# 1 [was 3] V/m --> V Gain:92.0
FILTER# 2 [was 0] V --> V Gain:1.0
FILTER# 3 [was 1] V --> V Gain:1.0
FILTER# 4 [was 4] V --> count Gain:484733700000000.0
FILTER# 5 [was 5] count --> count Gain:1.0
from:
FILTER# 0 electric field 5 pole butterworth low-pass mV/km --> mV/km, Gain:1.0
FILTER# 1 electric field 1 pole butterworth high-pass mV/km --> mV/km, Gain:1.0
FILTER# 2 mv per km to v per m mV/km --> V/m, Gain:1e-06
FILTER# 3 v per m to v V/m --> V, Gain:92.0
FILTER# 4 v to counts (electric) V --> count, Gain:484733700000000.0
FILTER# 5 electric time offset count --> count, Gain:1.0
and hx is ok how it is:
FILTER# 0 magnetic field 3 pole butterworth low-pass nT --> V, Gain:1.0
FILTER# 1 v to counts (magnetic) V --> count, Gain:100.0
FILTER# 2 hx time offset count --> count, Gain:1.0
Use SI units in the metadata, let the units be converted in the processing
Counts to volts for electric are not realistic, possibly this system is already reporting in mV/km? and the additional 1e-6 is why I have a scale factor error?
Counts to volts for electric and magnetic should nominally be the same
This requires a review with the team, and relates to issue #36
The way that the scale factors are setup in SEED is that the data in counts are divided by the product of the stages. Compare the CAS04 metadata with that of SAO
Ctrl+F LQ2
In that example the total sensitivity is -9.96867550678789E8 ~= 1e9
This station uses the dipole length (236m) as an analog gain in stage 1, together with the PZ response of the EFSC front end analog filter. Stage 2 is the 10x gain on the EFSC. Stage 3 is volts to counts, with a gain of 427135. i.e. ~4e5 counts per volt.
Working back from data in counts we have the following logic:
Say you read 400000 counts. This corresponds to 1 "Gained Volt" after the amplifier, which is due to 0.1 "True Volt" before the amplifier which is around 0.0004 V/m, which is ~420mV/km. A plausible field to observe at SAO during a major storm.
In the case of CAS04 the counts to Volts is ~1.72 x 2^48, whereas at SAO it is around 2^19. It doesn't seem plausible that there are enough bits on the digitizer to allow for a 2^48 counts per volt.
It would be helpful to have the config files from the SPUD data processing run. Are these available?
@akelbert I am going to put a summary of what we discussed this morning and a request for follow up info in later this morning -- testing comm
@kkappler I'm attaching the configuration files that we used for the RR CAS04 processing (NASA-MTA20 survey). Also attaching the four CAS04 SP files with an added *.txt extension - otherwise GitHub won't attach.
CF_for_sharing.zip CAS04a.sp.txt CAS04b.sp.txt CAS04c.sp.txt CAS04d.sp.txt
Thanks @akelbert - these are a great start and I can use these to check the units. Are there the other processing parameters by any chance? Specifically:
I think these are all of them, but I don't see where the apodization window is defined ... maybe I am missing a file? In any case, it would be ideal if we could group together all the settings used in the processing that generated the SPUD data.
Hi @kkappler, these settings are all in the zip archive above. The *.cfg files cannot be attached to issues in GitHub, extension not recognized. The zip has them all.
@kkappler on "counts only in IRIS", see the FDSN SEED manual, http://www.fdsn.org/pdf/SEEDManual_V2.4.pdf Blockette 1000 (miniSEED - Data Only) on p. 123. As far as I'm aware, that's the unique data archiving format allowed at IRIS.
@akelbert The zip archive is empty. Maybe because the files inside need to be "tricked" into being .txt as well?
@kkappler trying again; if that still doesn't work I'll rename them all to txt files.
I am going to add a few comments to describe the differences I see between the mt_metadata and the files provided by Anna. Here is a description of the metadata that I am interacting with sourced from the current station_xml.
I am using data that is created by the example in mth5 here:
once that file is created, filling in filepath with your local path in the example code below allows interaction with the filters.
from pathlib import Path
from mth5 import mth5
filepath = Path("path/to/file.h5")
mth5_obj = mth5.MTH5()
mth5_obj.open_mth5(filepath, mode="r")
cas04 = mth5_obj.stations_group.get_station("CAS04")
run_a=cas04.get_run('a')
ex = run_a.get_channel("ex")
ex.channel_response_filter.filters_list
makes the following output
[{
"pole_zero_filter": {
"calibration_date": "1980-01-01",
"comments": "butterworth filter",
"gain": 1.0,
"name": "electric field 5 pole butterworth low-pass",
"normalization_factor": 313383.601119193,
"poles": [
"(-3.883009+11.951875j)",
"(-3.883009-11.951875j)",
"(-10.166194+7.386513j)",
"(-10.166194-7.386513j)",
"(-12.566371+0j)"
],
"type": "zpk",
"units_in": "mV/km",
"units_out": "mV/km",
"zeros": []
}
},
{
"pole_zero_filter": {
"calibration_date": "1980-01-01",
"comments": "butterworth filter",
"gain": 1.0,
"name": "electric field 1 pole butterworth high-pass",
"normalization_factor": 1.00000351809047,
"poles": [
"(-0.000167+0j)"
],
"type": "zpk",
"units_in": "mV/km",
"units_out": "mV/km",
"zeros": [
"0j"
]
}
},
{
"pole_zero_filter": {
"calibration_date": "1980-01-01",
"comments": "unit conversion",
"gain": 1e-06,
"name": "mv per km to v per m",
"normalization_factor": 1.00000351809047,
"poles": [],
"type": "zpk",
"units_in": "mV/km",
"units_out": "V/m",
"zeros": []
}
},
{
"pole_zero_filter": {
"calibration_date": "1980-01-01",
"comments": "electric dipole",
"gain": 94.0,
"name": "v per m to v",
"normalization_factor": 1.00000351809047,
"poles": [],
"type": "zpk",
"units_in": "V/m",
"units_out": "V",
"zeros": []
}
},
{
"coefficient_filter": {
"calibration_date": "1980-01-01",
"comments": "analog to digital conversion",
"gain": 484733700000000.0,
"name": "v to counts (electric)",
"type": "coefficient",
"units_in": "V",
"units_out": "count"
}
},
{
"time_delay_filter": {
"calibration_date": "1980-01-01",
"comments": "time correction",
"delay": -0.285,
"gain": 1.0,
"name": "electric time offset",
"type": "time delay",
"units_in": "count",
"units_out": "count"
}
}]
@kkappler @kujaku11 so I'm still confused on our resolution on the gain factor question. I combed through my USGS email over the years, and at least since 2015 there was no mention of the factor 4.847337E14. Prior to the latest archiving where this factor was introduced as the Volts to counts conversion, the gain for this conversion was always set to 4.09601E8 (makes perfect sense to me). This change followed a lengthy discussion (I believe) a suggestion from Tim Ronan - better check with him! After looking through all the documents, I am inclined to go back to the 4.09601E8 for this re-archiving. Also, prior to the year 2020, we've always archived the electrics in V/m, e.g., http://service.iris.edu/fdsnws/station/1/query?net=EM&sta=IDA11&cha=*&level=response. Now, we've made the change to mV/km, as in http://service.iris.edu/fdsnws/station/1/query?net=ZU&sta=KSR28&cha=*&level=response. This, too, followed a lengthy discussion with everyone, including Gary, why said that the only reason this wasn't done in the past was that IRIS didn't support non-SI units at the time this archiving was initially setup.
The only info I could find in the archiving docs follows below (also, attached):
Gain for stage 1 for magnetic channels (Butterworth 3 poles low-pass) is calculated as 1/hfac1e9, where hfac is taken from .hed file, at zero frequency (conversion from Tesla to digital counts). Typical value of hfac is 0.01, then gain will be 1e11, as shown in the table below. Sensitivity is a product of gains over entire channel at zero frequency, that is also 1/hfac109. Gain for stage 1 for electric channels is equal to the dipole length (m), coming from .hed files (100m in the example below) and corrected for normalization frequency 0.01 (i.e. multiplied by A0/A1, where A1 calculated at frequency=0.01Hz). Gain for stage 2 for magnetic channels (low-pass) is calculated as 1/efac103, where efac is taken from .hed file, and multiplied by A0/A1, where A1 calculated at frequency=0.01Hz (conversion from Volts to digital counts). For example, typical value for MT1 NIMS efac is 2.441406e-6, thus typical value for gain will be 4.0960e+08, multiplied by A0/A1 (~1). Sensitivity is again calculated as a product of gains over entire channel at 0.01Hz frequency.
At this point, for my re-archiving I'd like to switch back to that original gain factor which makes sense to me - the new one doesn't. Would that fix the discrepancy that you're seeing? I also suggest that we stick with the practical units, but OK with switching back to SI units if that's for some reason easier. OK with moving the dipole length multiplication upfront to make it Stage 1. At any rate, let us set up a telecon between the three of us, at least, but better - with Gary, and finalize these details so that I can complete the archiving and move on. I really have to at this time.
Updated XML no longer have very large count to volt conversion. Still need to check that dipole length is correct (it should be 92).
Testing against the archived result on SPUD is a bit complicated by the fact that the SPUD results are .zmm (not .zrr).
Anna will run a .zrr and share
@kkappler here are three decent remote reference options for CAS04 and the corrected/updated StationXML for this site. Please check against Aurora!
@akelbert is it possible to generate .zrr for non-merged runs... specifically these examples all use b,c,d together. Can you please create one with only run b (or only run c or only run d or all of the above?) It isn't important right now which of them, just as long as I know which one it is. This will allow me to run the tests without merging runs which is still being built out.
@akelbert : Also, can I get the station.xml for the remote reference station as well? That way I can pull the reference data based on the listed runs available in the reference station ... I suppose that if you just ran single station processing on b,c, or d we could circumvent this for now
@akelbert CAS04_stationxml.xml.zip does not pass validation.
211,Error,8P,CAS04,,,2020-06-02T18:41:43,2020-07-13T21:46:12,Chan: LQE Loc: 2020-07-01T19:36:55 2020-07-13T21:46:12 epoch overlaps with LQE 2020-07-01T19:36:55 2020-07-13T21:46:12
8P.CAS04.LQE 2020-06-02T18:41:43 2020-07-13T21:46:12 is duplicated in CAS04_stationxml.xml.zip . The attached zip file has the updated stationxml file attached.
Also, this stationxml has a different network code than the CAS04 station in the IRIS archive. The archived version of CAS04 is ZU.CAS04 while the stationxml is labeled as 8P.CAS04. This leads to problems when trying extracting time series from the the IRIS archive unless this Network naming conflict is manually resolved.
The attached metadata does not resolve this Network name conflict because I think that ZU may be renamed to 8P but it is important to note this Network naming conflict so we don't accidentally trip over this inconsistency in the future.
CAS04_stationxml_updated.xml.zip
Karl says: This file is corrupt but a verison of it is committed in aurora/tests/cas04/cas04_from_tim_20211203.xml
@kujaku11 : Revisiting this dataset in the context of merging runs. The currently archived version is missing filters for electric channels. In particular:
If you want to build the mth5 locally to inspect it, this can be done using the new operate_aurora.ipynb in docs/notebooks on main branch. Can you PTAL and see if an updated metadata file with consistent channels can be passed to Tim or Laura for archiving?
In a pinch I will try just overwriting the electric field filters from run "a" onto "c" and "d" in operate_aurora.ipynb and see if I can workaround that way.
The specific issue with the metadata (cited above on June 6, 2022) is still under investigation by @laura-iris , and she will be looking at the make_mth5 logic this month. In the meantime, a variation to make_mth5 has been applied on the fix_issue_105
branch of mth5. Details are at https://github.com/kujaku11/mth5/issues/105 & https://github.com/kujaku11/mth5/pull/106.
With this fix in place I am now able to generate mth5 files with all the stations ["CAS04", "CAV07", "NVR11", "REV06"] and am starting to get remote reference processed results. I will report these results on this issue as they are generated.
Pasted below are comparisons of the z-files returned by aurora and emtf. They agree reasonably well at low frequencies. At high frequencies, we know that coherence sorting was used by emtf, and this feature is not yet available in aurora.
Note that NVR11 was processed using a different band setup file due to issue #196
Closing this issue for now. We may chose to re-open when we address coherence sorting module.
The CAS04 dataset processes with aurora but differs from the archived results in SPUD by an factor of ~1e-12. Specifically, the aurora TFs are much smaller than those on SPUD.
A plausible source of error here is if the Electric field data were interpretted as mV/km , but were actually V/m, then the E field spectra would be too large by 1e-6. Because apparent resistivity is proportional to 1/|E|^2 this would make the TF differ by 1e-12.
The data were generated by this python file https://github.com/kujaku11/mth5/blob/master/examples/make_mth5_from_iris_dmc_local.py
And the results pulled from the web were accessed here: http://ds.iris.edu/spud/emtf/18633652
ToDo: Review the metadata units for all stages of the filters. This is a NIMS long period system.