pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.04k stars 287 forks source link

Change `start_time` and `end_time` handling in `combine_metadata` #2737

Closed pnuu closed 4 months ago

pnuu commented 5 months ago

The times of datasets used in composites were averaged to get the final time values for the composite. With this PR, the start_time and end_time attributes are instead changed to use the earliest and latest values, respectively. In addition, for StaticImageCompositor the default start_time and end_time values are set to None if they are not available in the filename.

pnuu commented 5 months ago

Going through the failing tests. The others are easy to fix, but I'm not sure what combine_times=False behaviour should be in satpy.multiscene.stack(). Looks like then the average is used, but would it ok to just skip that option and always use min/max of start/end times?

pnuu commented 5 months ago

The satpy.multiscene.stack() option combine_times is not documented, so that would indicate that the deletion of it would be safe.

codecov[bot] commented 5 months ago

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (eb4ac0b) 95.40% compared to head (b8a47a9) 95.89%. Report is 12 commits behind head on main.

Files Patch % Lines
satpy/dataset/metadata.py 97.14% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2737 +/- ## ========================================== + Coverage 95.40% 95.89% +0.48% ========================================== Files 371 371 Lines 52825 52826 +1 ========================================== + Hits 50399 50656 +257 + Misses 2426 2170 -256 ``` | [Flag](https://app.codecov.io/gh/pytroll/satpy/pull/2737/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pytroll) | Coverage Δ | | |---|---|---| | [behaviourtests](https://app.codecov.io/gh/pytroll/satpy/pull/2737/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pytroll) | `4.16% <9.75%> (+<0.01%)` | :arrow_up: | | [unittests](https://app.codecov.io/gh/pytroll/satpy/pull/2737/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pytroll) | `95.99% <98.78%> (-0.04%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pytroll#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

coveralls commented 5 months ago

Pull Request Test Coverage Report for Build 7897556037

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/dataset/metadata.py 34 35 97.14%
<!-- Total: 81 82 98.78% -->
Files with Coverage Reduction New Missed Lines %
satpy/tests/test_readers.py 1 99.36%
satpy/readers/init.py 4 98.65%
<!-- Total: 5 -->
Totals Coverage Status
Change from base Build 7726219656: 0.003%
Covered Lines: 50528
Relevant Lines: 52649

💛 - Coveralls
pnuu commented 5 months ago

What happens in Scene.save_datasets if start_time is None? How do we want to handle that? Let it fail?

The only case this should happen if the user is saving the plain data from generic_image reader that didn't have the time available in the filename. I guess we could add some kind of error handling for this case, but I'm not sure it's worth the effort :thinking:

djhoese commented 5 months ago

The only case this should happen if the user is saving the plain data from generic_image reader that didn't have the time available in the filename. I guess we could add some kind of error handling for this case, but I'm not sure it's worth the effort 🤔

Good point. I guess my only other fear would be odd situations with the MultiScene where you need to resample and have a static image, but the MultiScene wants to do something with ordering by start time...nah this shouldn't be a problem. Ok sounds good to not worry about it.

djhoese commented 4 months ago

Oh @pnuu I think this min/max code (including the time_parameters method it calls) could be removed:

https://github.com/pytroll/satpy/blob/cd33a0a5e17798a43e59efa4139f55448f9b4801/satpy/readers/file_handlers.py#L115-L118

pnuu commented 4 months ago

Oh @pnuu I think this min/max code (including the time_parameters method it calls) could be removed:

https://github.com/pytroll/satpy/blob/cd33a0a5e17798a43e59efa4139f55448f9b4801/satpy/readers/file_handlers.py#L115-L118

Removed the duplicate handling of times and adjusted the file handler test to actually use datetimes.

djhoese commented 4 months ago

Note: As discussed above, what still worries me a little bit is indeed the generic_image reader possibly returning datasets without a valid start_time... I think there are users that use satpy for simple operations like opening a geotiff, resample it and save it again, which could fail.

@ameraner good point, but skimming the changes in this PR again, I don't think the generic_image reader's behavior has changed at all. It was already returning a start_time of None and it was up to the user to override that...right?

ameraner commented 4 months ago

@ameraner good point, but skimming the changes in this PR again, I don't think the generic_image reader's behavior has changed at all. It was already returning a start_time of None and it was up to the user to override that...right?

Yes, indeed. Changing that would be outside the scope of this PR, and I'm not sure what the best solution would be anyway (since giving a "dummy" start_time can mess up other calculations, as we see here).

gerritholl commented 4 months ago

what still worries me a little bit is indeed the generic_image reader possibly returning datasets without a valid start_time...

I am not convinced that the generic_image reader should guarantee a (valid) start_time. We could possibly expose whatever times are in image metadata (such as EXIF headers) or file metadata, but the type of imagery read by the generic_image reader is too diverse for any of those to be generally valid as a start_time, as elsewhere in Satpy, this refers to the measurement time, not to the image creation time. Dealing with images that don't have a start_time would rather seem to be the responsibility of downstream users.

pnuu commented 4 months ago

@ameraner good point, but skimming the changes in this PR again, I don't think the generic_image reader's behavior has changed at all. It was already returning a start_time of None and it was up to the user to override that...right?

Yes, indeed. Changing that would be outside the scope of this PR, and I'm not sure what the best solution would be anyway (since giving a "dummy" start_time can mess up other calculations, as we see here).

This could be handled in the writer (in another PR) with a simple (pseudo-code)

if self.start_time is None:
    self.start_time = dt.datetime.utcnow()
fname = self.compose_fname_from_stuff()
djhoese commented 4 months ago

This could be handled in the writer (in another PR) with a simple (pseudo-code)

Eh, too much magic. If the start_time being None is a problem then the user should have to work around it. For example, if the filename generation is the problem then they should save it with a different filename template string.

mraspaud commented 4 months ago

Everybody seems happy about this one, merging