Closed pnuu closed 1 year ago
Merging #158 (b8f6aed) into main (764661a) will decrease coverage by
0.35%
. The diff coverage is98.19%
.
@@ Coverage Diff @@
## main #158 +/- ##
==========================================
- Coverage 95.70% 95.36% -0.35%
==========================================
Files 11 13 +2
Lines 2491 2652 +161
==========================================
+ Hits 2384 2529 +145
- Misses 107 123 +16
Flag | Coverage Δ | |
---|---|---|
unittests | 95.36% <98.19%> (-0.35%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
trollflow2/launcher.py | 87.22% <80.00%> (-0.62%) |
:arrow_down: |
trollflow2/plugins/__init__.py | 93.07% <100.00%> (+0.09%) |
:arrow_up: |
trollflow2/plugins/s3.py | 100.00% <100.00%> (ø) |
|
trollflow2/tests/test_launcher.py | 98.31% <100.00%> (+0.18%) |
:arrow_up: |
trollflow2/tests/test_s3_plugins.py | 100.00% <100.00%> (ø) |
|
trollflow2/tests/test_trollflow2.py | 98.38% <100.00%> (-1.13%) |
:arrow_down: |
trollflow2/tests/utils.py | 74.46% <100.00%> (+9.76%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
@pnuu thanks for the work in this PR. I thought a little about this, and I think though that what is being added here is in fact a dispatching mechanism, for which we have a process in trollmoves: https://github.com/pytroll/trollmoves/blob/main/trollmoves/dispatcher.py I think the benefit with the dispatcher is that no time is lost dispatching/uploading the file inside the trollflow run...
Ok, I've tested this PR with proper reading and saving, and it works.
There is just the trollflow2.launcher.check_results
function that is logging ERROR
messages due to the files are not available anymore after S3 upload and subsequent deletion. Any thoughts what to do about this @mraspaud?
Still some work needed in the messaging, the published messages have URIs like this: /tmp/s3:/bucket-name/20220908_1050_Meteosat-10_EPSG3035_wv73.tif
. So the temporary save path is there, and also a slash is missing after the scheme.
With the latest commit the messages are now working. Still need to figure out what to do with trollflow2.launcher.check_results
.
@pnuu thanks for the work in this PR. I thought a little about this, and I think though that what is being added here is in fact a dispatching mechanism, for which we have a process in trollmoves: https://github.com/pytroll/trollmoves/blob/main/trollmoves/dispatcher.py I think the benefit with the dispatcher is that no time is lost dispatching/uploading the file inside the trollflow run...
Hadn't noticed this comment. The dispatcher would work when running on a (virtual) server with multiple tasks, but not when running in a container where there's supposed to be only a single process running and the saving is done inside the container. Using a shared volume might work, but that would also add a new container and set of configuration files.
This now uses trollmoves.movers.S3Mover
instead of direct s3fs
calls.
It seems that the upload is done to s3://<bucket name>/<image name>/<image name>
...
Ok, it was a usage error that was revealed when uploading to real S3. The filename was interpreted as a prefix ("directory") by botocore
. Fixed with https://github.com/pytroll/trollflow2/pull/158/commits/5d53219409ae5e5f0c5b13429ff83001de33941e
Ok, I've tested this PR with proper reading and saving, and it works.
There is just the
trollflow2.launcher.check_results
function that is loggingERROR
messages due to the files are not available anymore after S3 upload and subsequent deletion. Any thoughts what to do about this @mraspaud?
I'm now thinking that maybe the S3 uploading and local file deletion should be separate plugins :thinking: The other options seem more complicated.
Nevermind, that won't work either because check_results()
is called after all the plugins. The filenames are put into a queue
in save_datasets
plugin, so updating the filenames to S3 variants and modifying check_results()
to do the check over S3 won't work directly either.
One option would be to add check_results(..., remote_filesystem=None)
kwarg and with minor refactoring handle each protocol as the need comes. For S3 that would be something like
def _check_s3_file(saved_file):
s3 = S3FileSystem()
return s3.stat(saved_file)['size']
which would raise FileNotFoundError
that is already handled.
@mraspaud any other ideas?
I thought a bit more about this, to try to keep things clean and separated. Could we have a custom class for taking care of files? Something that mimics a python file object for example, and that uploads the file on close
for example?
I think this should work for what I need. I'll add a test for this tomorrow unless there's something more urgent happening.
I think this is now closer to ready, and all comments should have been handled in a way or the other.
WTH did that feature-s3-upload-plugin
came from?! It's now in 18 random places in test_trollflow2.py
:man_facepalming:
I ran a test in OpenShift with the latest commits and #159 (merged locally) and everything is still working.
This PR adds a plugin that updloads the generated files to S3. ~Optionally~ Also deletes the local files after the transfer.
flake8 trollflow2