Open abarciauskas-bgse opened 4 years ago
I haven't checked closely, but I think right now rechunker
will refuse to overwrite existing data. Can you try removing the objects at example-bucket/tmp-group.zarr
(and example-bucket/group.zarr
if there are any) before doing the rechunk?
As a user, what do you think would be the best behavior here? Overwrite any objects / files that are already present probably?
@abarciauskas-bgse -- thanks so much for reporting this bug!
I'm confused why this is happening. Can you try just running
assert s3_tmp_store is not None
on its own? I don't understand why that assertion statement is getting triggered here when you are passing a valid store. I don't think that @TomAugspurger's suggestion related to overwriting is relevant here, because that would raise a different error.
Sorry, my comment can be ignored. I wasn't looking closely enough.
On Mon, Jul 27, 2020 at 9:50 AM Ryan Abernathey notifications@github.com wrote:
@abarciauskas-bgse https://github.com/abarciauskas-bgse -- thanks so much for reporting this bug!
I'm confused why this is happening. Can try just running
assert s3_tmp_store is not None
on its own? I don't understand why that assertion statement is getting triggered here when you are passing a valid store. I don't think that @TomAugspurger https://github.com/TomAugspurger's suggestion related to overwriting is relevant here, because that would raise a different error.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/rechunker/issues/32#issuecomment-664443213, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIRZPLGPWKB5CBTMKWTR5WH25ANCNFSM4PIDLVTA .
@rabernat @TomAugspurger thanks for your responses. I did some more troubleshooting and it looks like the issue is if the S3 path to the temp store doesn't exist in the S3 location path then temp_store_or_group
will be None
and the assertion will raise. So workaround is just to create the S3 path, which works for me! But is a bit different from behavior if the temp store is local (will create the path for you).
Happy to close this issue or leave open until this scenario is handled (which I can look into if that helps, just might take me more time than someone more familiar with the codebase).
I did some more troubleshooting and it looks like the issue is if the S3 path to the temp store doesn't exist in the S3 location path then
temp_store_or_group
will beNone
and the assertion will raise. So workaround is just to create the S3 path, which works for me!
This is very good to know. I did all my debugging with GCS, where this behavior is very different. If the path doesn't exist in GCS, it gets created automatically.
I think we need to either / both:
I'll just ping @martindurant to see if he has any thoughts on the different behavior in s3fs vs gcsfs.
Coming in late: do I understand that something different is happening when you create a mapper on s3 versus gcsfs? I would imagine this is an fs.exists
implementation issue, which is tricky to do for a directory on object stores.
Since quite a lot changed with the async rewrite, it would be worth checking if this is still the case on gcsfs master and https://github.com/dask/s3fs/pull/336 (not yet merged).
do I understand that something different is happening when you create a mapper on s3 versus gcsfs?
Yes, we think this is the case, but no one has sat down to test it carefully.
So workaround is just to create the S3 path
@abarciauskas-bgse: can you share the code you use to create the S3 path?
@rabernat
s3_tmp_store = s3fs.S3Map(root='ds-projects/eodc/mursst/tmp.zarr', create=True, s3=s3)
Which is used within the notebook : https://github.com/abarciauskas-bgse/mur_sst-to-zarr/blob/master/images/rechunker/Rechunking_Zarr.ipynb
👋 First off - I'm very grateful and excited for this package and have used it to rechunk a large array (MUR SST
analysed_sst
)Now I am trying to rechunk the entire mursst group, which includes lat, lon and time coordinates and an additional 3 variables (but just focusing on
analysed_sst
for now).I'm hoping for some advice on troubleshooting the following error:
When I run this I get the following assertion error:
Which is odd because if I try and use local storage (e.g. replace
s3_tmp_store
with a local directory 'tmp.zarr') the plan is created, and I can start executing the plan but know that the the plan will eventually fail because there is not enough local disk storage.Any advice is welcome. Here is the notebook I am using: Rechunking_Zarr.ipynb