Closed kujaku11 closed 2 years ago
@kkappler
Looking at config
a factorization might be:
duplicate of #30
@kkappler I've mocked up some metadata classed based on existing config files. The end result is below. Check it and see what you think, and if you have time check out the base classes under aurora.config.metadata
. I got the result below by doing:
from aurora.config import Processing
p = Processing()
p.read_emtf_bands(r"aurora\aurora\config\emtf_band_setup\bs_256_26.cfg")
print(p.to_json())
{
"processing": {
"decimations": {
"1": {
"decimation_level": {
"anti_alias_filter": "default",
"bands": [
{
"band": {
"decimation_level": 1,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 55,
"index_min": 47
}
},
{
"band": {
"decimation_level": 1,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 46,
"index_min": 39
}
},
{
"band": {
"decimation_level": 1,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 37,
"index_min": 31
}
},
{
"band": {
"decimation_level": 1,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 30,
"index_min": 25
}
},
{
"band": {
"decimation_level": 1,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 24,
"index_min": 20
}
},
{
"band": {
"decimation_level": 1,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 19,
"index_min": 16
}
},
{
"band": {
"decimation_level": 1,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 15,
"index_min": 13
}
},
{
"band": {
"decimation_level": 1,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 12,
"index_min": 10
}
},
{
"band": {
"decimation_level": 1,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 9,
"index_min": 8
}
},
{
"band": {
"decimation_level": 1,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 7,
"index_min": 6
}
},
{
"band": {
"decimation_level": 1,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 5,
"index_min": 5
}
}
],
"decimation.factor": 1.0,
"decimation.level": 1,
"decimation.method": "default",
"decimation.sample_rate": 1.0,
"extra_pre_fft_detrend_type": "linear",
"input_channels": [
"hx",
"hy"
],
"output_channels": [
"ex",
"ey",
"hz"
],
"prewhitening_type": "first difference",
"regression.max_iterations": 10,
"regression.max_redescending_iterations": 10,
"regression.minimum_cycles": 10,
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
},
"2": {
"decimation_level": {
"anti_alias_filter": "default",
"bands": [
{
"band": {
"decimation_level": 2,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 17,
"index_min": 14
}
},
{
"band": {
"decimation_level": 2,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 13,
"index_min": 11
}
},
{
"band": {
"decimation_level": 2,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 10,
"index_min": 9
}
},
{
"band": {
"decimation_level": 2,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 8,
"index_min": 7
}
},
{
"band": {
"decimation_level": 2,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 6,
"index_min": 6
}
},
{
"band": {
"decimation_level": 2,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 5,
"index_min": 5
}
}
],
"decimation.factor": 1.0,
"decimation.level": 2,
"decimation.method": "default",
"decimation.sample_rate": 1.0,
"extra_pre_fft_detrend_type": "linear",
"input_channels": [
"hx",
"hy"
],
"output_channels": [
"ex",
"ey",
"hz"
],
"prewhitening_type": "first difference",
"regression.max_iterations": 10,
"regression.max_redescending_iterations": 10,
"regression.minimum_cycles": 10,
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
},
"3": {
"decimation_level": {
"anti_alias_filter": "default",
"bands": [
{
"band": {
"decimation_level": 3,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 17,
"index_min": 14
}
},
{
"band": {
"decimation_level": 3,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 13,
"index_min": 11
}
},
{
"band": {
"decimation_level": 3,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 10,
"index_min": 9
}
},
{
"band": {
"decimation_level": 3,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 8,
"index_min": 7
}
},
{
"band": {
"decimation_level": 3,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 6,
"index_min": 6
}
},
{
"band": {
"decimation_level": 3,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 5,
"index_min": 5
}
}
],
"decimation.factor": 1.0,
"decimation.level": 3,
"decimation.method": "default",
"decimation.sample_rate": 1.0,
"extra_pre_fft_detrend_type": "linear",
"input_channels": [
"hx",
"hy"
],
"output_channels": [
"ex",
"ey",
"hz"
],
"prewhitening_type": "first difference",
"regression.max_iterations": 10,
"regression.max_redescending_iterations": 10,
"regression.minimum_cycles": 10,
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
},
"4": {
"decimation_level": {
"anti_alias_filter": "default",
"bands": [
{
"band": {
"decimation_level": 4,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 22,
"index_min": 18
}
},
{
"band": {
"decimation_level": 4,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 17,
"index_min": 14
}
},
{
"band": {
"decimation_level": 4,
"frequency_max": 0,
"frequency_min": 0,
"index_max": 13,
"index_min": 10
}
}
],
"decimation.factor": 1.0,
"decimation.level": 4,
"decimation.method": "default",
"decimation.sample_rate": 1.0,
"extra_pre_fft_detrend_type": "linear",
"input_channels": [
"hx",
"hy"
],
"output_channels": [
"ex",
"ey",
"hz"
],
"prewhitening_type": "first difference",
"regression.max_iterations": 10,
"regression.max_redescending_iterations": 10,
"regression.minimum_cycles": 10,
"window.num_samples": 128,
"window.overlap": 32,
"window.type": "boxcar"
}
}
},
"stations.local.channel_scale_factors": [],
"stations.local.id": null,
"stations.local.mth5_path": null,
"stations.local.remote": false,
"stations.remote": []
}
}
This looks pretty good.
I assume it wouldn't be very hard to change the schema so that, for example,
prewhitening_type = "first difference"
becomes
prewhitening.type = "arma"
prewhitening.ar_order = 3
prewhitening.ma_order = 3
or similar, i.e. the schema can evolve over time...
Towards practically integrating with the existing tests in aurora, we need:
@kkappler A couple of questions regarding the config:
sample_rate
default to -1, does that mean it doesn't exist, instead of having a 0?
[{
"run": {
"id": [
"None"
],
"input_channels": [
{
"channel": {
"id": "hx",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "hy",
"scale_factor": 1.0
}
}
],
"output_channels": [
{
"channel": {
"id": "hz",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ex",
"scale_factor": 1.0
}
},
{
"channel": {
"id": "ey",
"scale_factor": 1.0
}
}
],
"sample_rate": -1.0,
"time_period.end": "1980-01-01T00:00:00+00:00",
"time_period.start": "1980-01-01T00:00:00+00:00"
}
}]
@kkappler In your decimation config there are attributes:
decimation_factor
- which is the factor by which to decimatesample_rate
- sample rate after decimation?Could we have initial_sample_rate
be the original sample rate pre decimation and then sample_rate
would be a property of initial_sample_rate
/ decimation_factor
?
The reason I ask is that in some functions it requires the initial sample rate not the decimated sample rate, and it would be good for the decimation object to have that information.
@kujaku11 sorry, I thought I had sent my reply to this already.
The way it is set up now, sample_rate
in the decimation_level_config is the sample_rate after decimation. As you suggest, it is derived from initial_sampling_rate
which is on the top level of the config, and is ultimately sourced from the mth5.
sample_rate
is actually redundant information, but I would like to keep it there because it is much more intuitive to someone inspecting the config than deducing it from the decimation_factor and the initial_sample_rate
.
@kujaku11 : What information do you need to translate between frequency and index for the various bands? Just the frequency array? Thinking setting frequency and index as properties such that they can update when the other is set, but you need more information and where should that information be stored?
Translating between frequency and index needs only sample_rate
of the data and the window.num_samples
, which the config already has. The other thing that is needed is a rule about frequency band edges, whether a frequency band, which is an interval is open, half-open, or closed. That rule will come from the FrequencyBand, and FrequencyBands classes.
I added a run list to station that looks like this, any thoughts? I figured the time period should be set at the run level, unless you think that down the road the time period could be set at the channel level? And should we make time period be a list of time periods in the event of masking data?
I think this is great! It occurs to me that if packaged like this, with the run_list, this representation of the config is basically an instance of a TransferFunctionKernel
. It tells what data runs to process, where the data are, and has the recipe for doing the processing. The only thing potentially missing here, that the DatasetDefinition
class I am fiddling with supports is the splitting of runs to excise segments that one would not want to process. I want to think about that some. I still think that including a DataFrame that lists the time series blocks as an optional argument to the process_mth5
function is worth having.
_Why is the samplerate default to -1, does that mean it doesn't exist, instead of having a 0?
I think maybe I was concerned about divide by zero exceptions. I don't think the value is too important, I just didn't want a potentially valid value being put in as a default, to protect against launching jobs that didn't explicitly get that value from somewhere (like the mth5). This is another case of a piece of redundant information, since it is in the mth5 we could access the sample_rate from the mth5, but as a user, I like the idea of that information being in the config when I am inspecting it.
@kujaku11 I'm making good progress in integrating the new processing config based on mt_metadata.
The following issue has come up when I want to STFT: The required properties are:
taper_family
num_samples_window
num_samples_overlap
taper_additional_args
sample_rate,
prewhitening_type
extra_pre_fft_detrend_type
It would be useful if one could define a dict from the processing config via a method. For example, I would like
processing_config.decimations[0].stft
or
processing_config.decimations[0].stft()
to return me either a dict or an object that has exactly the params I listed above.
The way things are set up now, each atom of metadata is defined in exactly one of the standards.json files, and I think this is what we want, BUT ideally, we want the ability to define custom methods that return user-defined mixtures of these atoms, either at the aurora.config.metadata.processing.Processing
or aurora.config.metadata.decimation_level.DecimationLevel
layers.
Also, FWIW, the taper_family I can see is supported as dec_level_config.window.type
But we also need to add to window additonal_args
which should default to an empty dictionary
decimation.factor should be forced to be an integer
@kujaku11 The remote_reference test on synthetic is now passing locally with the new Processing class.
Two things to discuss:
The Processing class does not have fields for "reference channel ids". I am instead using dec_config.input_channels for this. Normally this should be fine, the remote will usually use hx, hy, same as the local input channels. This is worth a quick email to Gary however, because I think it is possible that the remote reference can (theoretically) use electric channels
The remote reference station is list-like in the Processing class. This makes sense to define a processing "campaign" of sorts, but for a single TF estimate, we will not in general want to mix reference stations (I think) and so from a TF Kernel perspective there should only be a single RR.
This is working now. All tests are passing. The synthetic results are in agreement with EMTF to within 1e-4 in both rho and phi.
The parkfield results change slightly, for neither better nor worse. Here is the RR results from the old processing config, etc:
And Here from the new:
Bascially, the phases are a little better around 10s now, but the apparent res is a little more different (but still negligably so) from EMTF.
Make the config objects from
mt_metadata.base.Base
for easier translation between TF and MTH5 to store transfer functions and help keep track of the config parameters.@kkappler I will try to factor the current config but will ping you for questions and guidance on how to do this.