Ignore `nan` spectrogram values

iaraota commented 4 months ago

If the data is corrupted, the computed spectrograms may contain nan values. While these nan values are ignored in spectrogram plots, but they can cause issues in derived products such as the power spectral density and ratio spectrogram, where the presence of nan values leads to all values becoming nan, resulting in empty plots. This pull request addresses this issue by removing any potential nan values from the data used and saved in globalv.SPECTROGRAMS.

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 49.62%. Comparing base (720bd0c) to head (922827c). Report is 13 commits behind head on master.

Files	Patch %	Lines
gwsumm/data/coherence.py	75.00%	1 Missing :warning:
gwsumm/data/spectral.py	87.50%	1 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #410 +/- ## ========================================== - Coverage 49.64% 49.62% -0.03% ========================================== Files 60 60 Lines 8823 8848 +25 ========================================== + Hits 4380 4390 +10 - Misses 4443 4458 +15 ``` | [Flag](https://app.codecov.io/gh/gwpy/gwsumm/pull/410/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=gwpy) | Coverage Δ | | |---|---|---| | [Linux](https://app.codecov.io/gh/gwpy/gwsumm/pull/410/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=gwpy) | `49.62% <83.33%> (-0.03%)` | :arrow_down: | | [macOS](https://app.codecov.io/gh/gwpy/gwsumm/pull/410/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=gwpy) | `49.62% <83.33%> (?)` | | | [python3.10](https://app.codecov.io/gh/gwpy/gwsumm/pull/410/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=gwpy) | `49.62% <83.33%> (-0.03%)` | :arrow_down: | | [python3.11](https://app.codecov.io/gh/gwpy/gwsumm/pull/410/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=gwpy) | `49.62% <83.33%> (-0.03%)` | :arrow_down: | | [python3.9](https://app.codecov.io/gh/gwpy/gwsumm/pull/410/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=gwpy) | `49.62% <83.33%> (-0.03%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=gwpy#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

iaraota commented 4 months ago

Running example: https://ldas-jobs.ligo-la.caltech.edu/~iara.ota/summary/day/20240514/pem/end_stations/

Production version: https://ldas-jobs.ligo-la.caltech.edu/~detchar/summary/day/20240514/pem/end_stations/

eagoetz commented 4 months ago

@iaraota thanks! I was wondering: if I understand correctly, this check handles the possibility of new incoming spectrogram data to be added to the global memory variable will be excluded, but what about data already in the archive? Is the point that the corrupted data would never have made it into the archive file in the first place because of this check?

iaraota commented 4 months ago

@iaraota thanks! I was wondering: if I understand correctly, this check handles the possibility of new incoming spectrogram data to be added to the global memory variable will be excluded, but what about data already in the archive? Is the point that the corrupted data would never have made it into the archive file in the first place because of this check?

This change prevents nan values from entering the archive. Although we could add a check in the archive to delete nan data, this additional step may be unnecessary since this change already addresses the issue. Instead, we could simply delete the archive of the broken pages and rerun. Do we want to implement this extra step?

eagoetz commented 4 months ago

Thanks for clarifying. I think this solution is sufficient. However I'm wondering about another impact from excluding data from the global memory variable: is it possible that some channel(s) may have some nan value and thus the spectrogram shape is different than other spectrograms. Is it possible there could be some downstream consequence of that (like computing coherences or anything like that)? Are there other data elements that could be impacted by nan values?

iaraota commented 4 months ago

This function is not used in the coherence.py, so it should not be a problem in the coherence computation.

eagoetz commented 4 months ago

This function is not used in the coherence.py, so it should not be a problem in the coherence computation.

Which function? I'm also wondering if other computations (like coherence) would be impacted by some channels having nan but not others. Should coherence-grams also be protected in the same way as spectrograms?

iaraota commented 4 months ago

This function is not used in the coherence.py, so it should not be a problem in the coherence computation.

Which function? I'm also wondering if other computations (like coherence) would be impacted by some channels having nan but not others. Should coherence-grams also be protected in the same way as spectrograms?

Maybe I miss understood you. We can add this check after line 288.

iaraota commented 4 months ago

@eagoetz updated the PR to also ignore nan in coherence spectrograms.

eagoetz commented 4 months ago

@eagoetz updated the PR to also ignore nan in coherence spectrograms.

Awesome, thank you! One more minor suggestion: can a warning be printed so that it is more obvious when something like this happens so that it is not silently hidden from the user or log files? Thanks!

iaraota commented 3 months ago

@eagoetz updated the PR with warnings.

gwpy / gwsumm

Ignore `nan` spectrogram values #410

Codecov Report