pnuu commented 2 years ago

This PR adds a plugin that updloads the generated files to S3. ~Optionally~ Also deletes the local files after the transfer.

[x] Tests added
[x] Tests passed
[x] Passes flake8 trollflow2
[x] Fully documented

codecov[bot] commented 2 years ago

Codecov Report

Merging #158 (b8f6aed) into main (764661a) will decrease coverage by 0.35%. The diff coverage is 98.19%.

@@            Coverage Diff             @@
##             main     #158      +/-   ##
==========================================
- Coverage   95.70%   95.36%   -0.35%     
==========================================
  Files          11       13       +2     
  Lines        2491     2652     +161     
==========================================
+ Hits         2384     2529     +145     
- Misses        107      123      +16

Flag	Coverage Δ
unittests	`95.36% <98.19%> (-0.35%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
trollflow2/launcher.py	`87.22% <80.00%> (-0.62%)`	:arrow_down:
trollflow2/plugins/__init__.py	`93.07% <100.00%> (+0.09%)`	:arrow_up:
trollflow2/plugins/s3.py	`100.00% <100.00%> (ø)`
trollflow2/tests/test_launcher.py	`98.31% <100.00%> (+0.18%)`	:arrow_up:
trollflow2/tests/test_s3_plugins.py	`100.00% <100.00%> (ø)`
trollflow2/tests/test_trollflow2.py	`98.38% <100.00%> (-1.13%)`	:arrow_down:
trollflow2/tests/utils.py	`74.46% <100.00%> (+9.76%)`	:arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

mraspaud commented 2 years ago

@pnuu thanks for the work in this PR. I thought a little about this, and I think though that what is being added here is in fact a dispatching mechanism, for which we have a process in trollmoves: https://github.com/pytroll/trollmoves/blob/main/trollmoves/dispatcher.py I think the benefit with the dispatcher is that no time is lost dispatching/uploading the file inside the trollflow run...

pnuu commented 2 years ago

Ok, I've tested this PR with proper reading and saving, and it works.

There is just the trollflow2.launcher.check_results function that is logging ERROR messages due to the files are not available anymore after S3 upload and subsequent deletion. Any thoughts what to do about this @mraspaud?

pnuu commented 2 years ago

Still some work needed in the messaging, the published messages have URIs like this: /tmp/s3:/bucket-name/20220908_1050_Meteosat-10_EPSG3035_wv73.tif. So the temporary save path is there, and also a slash is missing after the scheme.

pnuu commented 2 years ago

With the latest commit the messages are now working. Still need to figure out what to do with trollflow2.launcher.check_results.

pnuu commented 2 years ago

@pnuu thanks for the work in this PR. I thought a little about this, and I think though that what is being added here is in fact a dispatching mechanism, for which we have a process in trollmoves: https://github.com/pytroll/trollmoves/blob/main/trollmoves/dispatcher.py I think the benefit with the dispatcher is that no time is lost dispatching/uploading the file inside the trollflow run...

Hadn't noticed this comment. The dispatcher would work when running on a (virtual) server with multiple tasks, but not when running in a container where there's supposed to be only a single process running and the saving is done inside the container. Using a shared volume might work, but that would also add a new container and set of configuration files.

pnuu commented 2 years ago

This now uses trollmoves.movers.S3Mover instead of direct s3fs calls.

pnuu commented 2 years ago

It seems that the upload is done to s3://<bucket name>/<image name>/<image name>...

pnuu commented 2 years ago

Ok, it was a usage error that was revealed when uploading to real S3. The filename was interpreted as a prefix ("directory") by botocore. Fixed with https://github.com/pytroll/trollflow2/pull/158/commits/5d53219409ae5e5f0c5b13429ff83001de33941e

pnuu commented 2 years ago

Ok, I've tested this PR with proper reading and saving, and it works.

There is just the trollflow2.launcher.check_results function that is logging ERROR messages due to the files are not available anymore after S3 upload and subsequent deletion. Any thoughts what to do about this @mraspaud?

I'm now thinking that maybe the S3 uploading and local file deletion should be separate plugins :thinking: The other options seem more complicated.

pnuu commented 2 years ago

Nevermind, that won't work either because check_results() is called after all the plugins. The filenames are put into a queue in save_datasets plugin, so updating the filenames to S3 variants and modifying check_results() to do the check over S3 won't work directly either.

One option would be to add check_results(..., remote_filesystem=None) kwarg and with minor refactoring handle each protocol as the need comes. For S3 that would be something like

def _check_s3_file(saved_file):
    s3 = S3FileSystem()
    return s3.stat(saved_file)['size']

which would raise FileNotFoundError that is already handled.

@mraspaud any other ideas?

mraspaud commented 2 years ago

I thought a bit more about this, to try to keep things clean and separated. Could we have a custom class for taking care of files? Something that mimics a python file object for example, and that uploads the file on close for example?

pnuu commented 2 years ago

I think this should work for what I need. I'll add a test for this tomorrow unless there's something more urgent happening.

pnuu commented 2 years ago

I think this is now closer to ready, and all comments should have been handled in a way or the other.

pnuu commented 1 year ago

WTH did that feature-s3-upload-plugin came from?! It's now in 18 random places in test_trollflow2.py :man_facepalming:

pnuu commented 1 year ago

I ran a test in OpenShift with the latest commits and #159 (merged locally) and everything is still working.

pytroll / trollflow2

Add a plugin to upload the generated images to S3 #158

Codecov Report