pytroll / trollflow2

Next generation Trollflow. Trollflow is for batch-processing satellite data using Satpy
https://trollflow2.readthedocs.org/
GNU General Public License v3.0
10 stars 15 forks source link

setting mtime of finished file for detection of newly created products #81

Closed Weathermann closed 1 year ago

Weathermann commented 4 years ago

update mtime before renaming to real filename for distribution tools that recognize new files based on file age

modify mtime of created product with touch() from pathlib for dist. tools

When using use_tmp_file = True in trollflow2 yaml products will be created with a tempfile. The duration of the begin of creating the file and finish writing can be a relatively long period of time; sometimes up to 40 minutes (NOAA-20, S-NPP).

DWD uses a file distribution system, which distribute new files to a number of customers. Only new products should be distributed. We recognize new files by the modification time by using the find command (e.g. -mmin 1). The finished product will have the timestamp of file creation. So with long creation durations the distribution program doesn't recognize such products as new files any more and there will be no distribution to our customers.

The desired behaviour is like the touch command in Linux. After the file is completely written, the file mtime should be updated, to indicate, that it is a new file.

No compatibilty breaks expected.

codecov[bot] commented 4 years ago

Codecov Report

Merging #81 into master will increase coverage by 0.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #81   +/-   ##
=======================================
  Coverage   92.40%   92.41%           
=======================================
  Files          10       10           
  Lines        1553     1555    +2     
=======================================
+ Hits         1435     1437    +2     
  Misses        118      118           
Impacted Files Coverage Δ
trollflow2/plugins/__init__.py 91.00% <100.00%> (+0.04%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 61528b4...a08ca8c. Read the comment docs.

Weathermann commented 4 years ago

@mraspaud: Do you think it is really necessary to make the touch() configurable? Wouldn't it be the expected behavior for a file to get the timestamp when it was completed?

And how can I test this?

mraspaud commented 4 years ago

To know when the file was completed one can check the modification time of the file (mtime). It is interesting to know when the file was created too I think, so for my part I'd like to keep the behaviour as it was.

As for the tests, it should be put in this file here probably: https://github.com/pytroll/trollflow2/blob/master/trollflow2/tests/test_trollflow2.py#L340

Also, could you change the title of this PR to tell a bit more about what this PR implements ?

Sorry for being so annoying on your first PR!

mraspaud commented 4 years ago

Any news on this PR @Weathermann ?

pnuu commented 3 years ago

I agree with @mraspaud , the touch should be configurable.

And out of curiosity: how large target areas do you have to where the processing takes 40 minutes? For my 3918x3868 pixel area the processing takes at a maximum 7.5 minutes using 2 Dask workers.

Weathermann commented 3 years ago

how large target areas do you have to where the processing takes 40 minutes?

It is a phenomenon of EARS VIIRS production from NOAA-20/S-NPP (Greenland area). But even if the duration is shorter: the problem will continue to exist, if a check program (for data distribution) checks for new created products every minute: If the creation time is a few minutes anyway, the file would still not be recognized as a new one.

mraspaud commented 3 years ago

@Weathermann yes, I understand the problem, we would just like to have this touch call to be optional :)

gerritholl commented 2 years ago

DWD uses a file distribution system, which distribute new files to a number of customers. Only new products should be distributed. We recognize new files by the modification time by using the find command (e.g. -mmin 1).

Are you sure? It seems to me that it's our own copy_afd script that concludes files are too old. I don't think the automatic file distributor itself has a mandatory age check. Can't you just include file_age in the distributions.yaml file and then the problem is avoided?