Fixing spectra calculations

sboltz commented 3 years ago

This pull request attempts to address a number of issues with the spectra calculations that were causing significant errors in source parameter calculations. The changes result in a decrease in Mw by approximately 1.0 for the test dataset, which is composed of small (Magnitude < 2) mining-induced events.

d-chambers commented 3 years ago

Hey @sboltz,

There are a whole lot of changes here. I tried going through the diffs but it seems there is a lot of small cleanup/automated formatting. Can you point me to the important parts? Obviously TraceGroup.dft, SpectrumGroup.dft_to_psd, SpectrumGroup.dft_to_cft and the related tests. Anything I am missing?

sboltz commented 3 years ago

Hey @sboltz,

There are a whole lot of changes here. I tried going through the diffs but it seems there is a lot of small cleanup/automated formatting. Can you point me to the important parts? Obviously TraceGroup.dft, SpectrumGroup.dft_to_psd, SpectrumGroup.dft_to_cft and the related tests. Anything I am missing?

Hi @d-chambers, the relevant bits are TraceGroup.dft, TraceGroup.mtspec, and SpectrumGroup.calc_source_params (and the rabbit hole that will take you down) as well as the associated tests. It may also be worth looking at SpectrumGroup.to_motion_type because going to acceleration, specifically, seems unrealistic.

d-chambers commented 3 years ago

Ok @sboltz , so now I am running up against this test:

    def test_values(self, source_params):
        """ Ensure values are plausible """
        # This is intentionally somewhat brittle and intended to catch
        # subtle yet significant changes to the parameter calculations
        medians = source_params.median()
        checks = {
            "fc": 5.7,
            "omega0": 2.9e-3,
            "velocity_squared_integral": 6.0e-3,
            "moment": 5.8e12,
            "potency": 1.4e2,
            "energy": 2.4e6,
            "mw": 2.5}
        for key, val in checks.items():
            np_assert(medians[key], val, rtol=0.02)

It is failing but I am not sure if it was right before or is right now (probably neither). How did you come up with these params and should I replace the test with the new values?

shawnboltz commented 3 years ago

@d-chambers How different are the values, and did you make changes to the code to warrant them?

I came up with them by essentially averaging the values obtained from using the dft method of TraceGroup and the mtspec method and then set the tolerance high enough that it would pass for both. That way there would be a little bit of wiggle room, but it would fail if things got too different.

d-chambers commented 3 years ago

They are quite a bit different, but I only changed the energy calculations. I wonder, do you have any un-pushed commits on this branch? The .loc bug SpectrumGroup.dft_to_psd is still in this branch.

https://github.com/seismopy/mopy/blob/631a18e77680e7ce34cd9687650ba1b5c394735a/mopy/core/spectrumgroup.py#L208-L224

shawnboltz commented 3 years ago

They are quite a bit different, but I only changed the energy calculations. I wonder, do you have any un-pushed commits on this branch? The .loc bug SpectrumGroup.dft_to_psd is still in this branch.

Are you looking at the correct branch? There is a screw_around and a screw_around_fix (sorry, poor naming convention). The test_values test is quite a bit out of date as well.

   def test_values(self, source_params):
        """ Ensure values are plausible """
        # This is intentionally somewhat brittle and intended to catch
        # subtle yet significant changes to the parameter calculations
        medians = source_params.median()
        checks = {
            "fc": (5.8, 0.02),  # (value, rtol)
            "omega0": (6.35e-5, 0.12),
            "moment": (1.3e11, 0.12),
            "potency": (3.05, 0.12),
            "energy": (1.17e7, 0.02),
            "mw": (1.34, 0.03),
        }
        for key, (val, tol) in checks.items():
            np_assert(medians[key], val, rtol=tol)

d-chambers commented 3 years ago

:facepalm: I just spent all day on the wrong branch!

shawnboltz commented 3 years ago

facepalm I just spent all day on the wrong branch!

Sorry. I will name things better next time... I just got myself in a bit of a git pickle on these changes.

d-chambers commented 3 years ago

Ok, so where did this test go, I just got it working.

    def test_processing_for_energy(self, gauss_trace_group, gauss_spec_group):
        """
        Check the normalization for energy calculations
        """
        # Compute the answer in the time domain
        data = gauss_trace_group.data
        stats = gauss_trace_group.stats
        sr = stats.sampling_rate.values  # array of sampling rates
        assert len(np.unique(sr)) == 1, "there should be one sampling rate"
        dx = 1. / sr[0]
        # Get the power by integrating velocity**2 in time domain
        vel_np = np.gradient(data, dx, axis=1)
        power_td = np.trapz((vel_np ** 2), dx=dx, axis=1)
        vel = gauss_spec_group.to_motion_type("velocity")
        fft_corrected = vel.dft_to_psd()
        power_fd = np.sum(fft_corrected.abs().data, axis=1)  # Getting problems from zero-padding????
        # This is still slightly off but probably just due to float weirdness
        np_assert(power_td, power_fd.values, rtol=0.01, atol=.0001)

shawnboltz commented 3 years ago

Ok, so where did this test go, I just got it working.

https://github.com/seismopy/mopy/blob/27d518627d6b38750710815ed7216e3bdf1548c1/tests/test_core/test_spectrum_group.py#L265

I'm free right now if it would be easier to discuss the changes.

sboltz commented 3 years ago

Thanks @d-chambers. I will go through these tomorrow.

As to your comment on the keyword arguments, I do understand where you are coming from about explicitly doing things separately. The tradeoff being that if you don't really know what you are doing it is very easy to skip an important step. I suppose that I see the calc_source_params method as a higher-level method and then if people really want to do a more custom thing then they can call the calculations for the individual source parameters they care about separately. It's far from perfect with how it is set up now, though, so I am open to your suggestions.

Thank you for all of your help on getting the spectra calculations sorted out.

d-chambers commented 3 years ago

Hey @sboltz,

Just in case it wasn't clear, I am working on fixing some of the suggestions I made, then I will open another PR for you to "counter-review" so don't worry about doing anything with this yet.

shawnboltz commented 3 years ago

Hey @sboltz,

Just in case it wasn't clear, I am working on fixing some of the suggestions I made, then I will open another PR for you to "counter-review" so don't worry about doing anything with this yet.

No worries. I knew you were working on it. I was just copying that line for another thing I'm working on and remembered that you had made a comment about it.

d-chambers commented 3 years ago

Hey @sboltz, I added the same flake8/pre-commit to this as ObsPlus has and tried to setup the CI. We will see how it goes.

It would be good to cycle through the TODOs and see which ones should be deleted. Happy to hop on a call sometime if you want to discuss any of them.

d-chambers commented 3 years ago

hmmm... this is failing because evaluation status is missing from the picks? Confusing...

shawnboltz commented 3 years ago

hmmm... this is failing because evaluation status is missing from the picks? Confusing...

That is very strange. Especially because that hasn't been a problem in any other environment with that test dataset. I know that it needs to check if any picks are rejected and dump them, but it doesn't seem like it should error if that column isn't present.

d-chambers commented 3 years ago

Ok so the issue is it requires the current master version of ObsPlus to run. For now I just have the CI install it via pip but we should probably do an ObsPlus release sometime soon anyway.

sboltz commented 3 years ago

Okay, @d-chambers, as soon as these checks finish I am done making changes to this (finally). Are we good to merge into master?

sboltz commented 3 years ago

Actually, @d-chambers, I could use a tiny bit of help on this. I know this failure has something to do with NaTs, but I don't know how to reproduce it locally so I can debug and fix it. How does one reproduce the environment that you've set up in the actions?

d-chambers commented 3 years ago

You can debug the CI directly but I would try to just re-create the test env locally first. There is a conda env file in the .github folder, I think the command is just something like:

conda env create -f test_env.yml

or something like that. Then install in editable mode with pip and run the test suite.

seismopy / mopy

Fixing spectra calculations #16