pymc-devs / pymc-experimental

https://pymc-experimental.readthedocs.io
Other
72 stars 46 forks source link

Add .to_zarr to model builder save function #266

Open jaharvey8 opened 8 months ago

jaharvey8 commented 8 months ago

I've had a lot of issues saving netcdf files on Amazon S3. Any opposition to adding .to_zarr to the model builder save function? If not, I can go ahead and create a pull request. Would probably add a input from the user indicating if they intended a netcdf file or zarr, but have netcdf the default.

canyon289 commented 8 months ago

Arviz uses zarr so I dont think it would. Just curious though why is netcdf causing problems in s3?

jaharvey8 commented 8 months ago

So I'm definitely not an S3 expert, but if I'm running my model in a Sagemaker notebook and attempting to save the trace to netcdf on S3 I tend to get a lot of cryptic errors. It seems possibly related to this https://github.com/pydata/xarray/issues/2995#issuecomment-497026828

What I started doing was saving the trace to the local Sagemaker environment and then moving to S3. But I wasn't able to load the trace directly from S3 so I ended up having to copy it back and then load it and that all seemed like too much trouble.

But if I do trace.to_zarr() it works just fine.

canyon289 commented 7 months ago

Ah got it. If its just the save and load from s3, have you tried using the zarr methods built into ArviZ? Do those work for you?

https://python.arviz.org/en/stable/api/generated/arviz.InferenceData.from_zarr.html

canyon289 commented 7 months ago

Im looking at this more closely and since its deferring to idata you might be able to do this easily I hope!

def save(self, fname: str, format="netcdf"):
   if format="zarr":
    ...

https://github.com/pymc-devs/pymc-experimental/blob/main/pymc_experimental/model_builder.py#L383