Closed jacobbieker closed 2 years ago
Merging #660 (1969c6f) into main (8036158) will increase coverage by
0.00%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## main #660 +/- ##
=======================================
Coverage 93.24% 93.25%
=======================================
Files 47 47
Lines 3197 3200 +3
=======================================
+ Hits 2981 2984 +3
Misses 216 216
Impacted Files | Coverage Δ | |
---|---|---|
...et/data_sources/satellite/satellite_data_source.py | 93.78% <100.00%> (+0.11%) |
:arrow_up: |
:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more
@peterdudfield
I've put up a pull request to add @jacobbieker! :tada:
how bullet-proof is this? what happens if the Zarr store on disk disappears while it's being loaded into RAM? :slightly_smiling_face: Or if the chunks on disk change after the metadata has been loaded?
Should we consider saving new Zarrs to different Zarr filenames, and deleting them after, say, 6 hours? So we can guarantee that a Zarr won't disappear while it's being read (assuming it's being read less than 6 hours after the Zarr was created!)
how bullet-proof is this? what happens if the Zarr store on disk disappears while it's being loaded into RAM? :slightly_smiling_face: Or if the chunks on disk change after the metadata has been loaded?
Should we consider saving new Zarrs to different Zarr filenames, and deleting them after, say, 6 hours? So we can guarantee that a Zarr won't disappear while it's being read (assuming it's being read less than 6 hours after the Zarr was created!)
Yeah, there is the issue of race conditions, we've tried to minimize it by writing to a different file then renaming after it's finished which is fairly quick. @peterdudfield and I are planning on making it more robust. Having different filenames does sound like a good alternative!
writing to a different file then renaming after it's finished
Great idea!! If renaming is an atomic operation on AWS then I would guess that's bullet-proof already? (but I guess that even if renaming a single file is atomic; renaming all the files in a Zarr is unlikely to be atomic?)
Awesome work, BTW! I'm super-impressed at home much stuff you guys are getting done!
writing to a different file then renaming after it's finished
Great idea!! If renaming is an atomic operation on AWS then I would guess that's bullet-proof already? (but I guess that even if renaming a single file is atomic; renaming all the files in a Zarr is unlikely to be atomic?)
Awesome work, BTW! I'm super-impressed at home much stuff you guys are getting done!
I don't know if it's atomic, I think it should be, but not sure. But each zarr is a single file, it's stored as a zip file like I've been doing for HF datasets and that seems to be working quite well.
nice! According to this, moving a single file is atomic on AWS. So I think that means that what you've already built should be bullet-proof. Nice!
If it's not then another option might be a good-old-fashioned LOCKFILE. But let's assume your current implementation is bullet-proof :slightly_smiling_face:
Pull Request
Description
Loads the Zarr arrays into memory if it is being run as live, as otherwise the Zarr store might disappear in the middle from being updated by the 5 minutely process.
Fixes https://github.com/openclimatefix/nowcasting_forecast/issues/68
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
If your changes affect data processing, have you plotted any changes? i.e. have you done a quick sanity check?
Checklist: