xarray-contrib / xbatcher

Batch generation from xarray datasets
https://xbatcher.readthedocs.io
Apache License 2.0
167 stars 27 forks source link

Dimension name change with `concat_input_dims` is a side effect #164

Open cmdupuis3 opened 1 year ago

cmdupuis3 commented 1 year ago

What is your issue?

Title. The problem is that changing dimension names makes it difficult for the user to index into batched arrays in a batch loop. This is particularly annoying because changing the value of concat_input_dims will change this behavior, sometimes appending _input, sometimes not, which makes debugging and experimentation difficult. I view this as an unwelcome side effect, and I'd prefer the non-batched dimensions keep their original names.

cmdupuis3 commented 1 year ago

Hey @maxrjones, does this serve any purpose? It's incredibly annoying to get through batch generation only to crash because I forgot to rename the dimensions I'm subsetting.

cmdupuis3 commented 1 year ago

Partial example:


    bgen = xb.BatchGenerator(
        ds,
        {'nlon':nlons,       'nlat':nlats},
        concat_input_dims=True
    )

    sub = {'nlon':range(halo_size,nlons-halo_size),
           'nlat':range(halo_size,nlats-halo_size)}

    for batch in bgen:
        batch_input  = [batch[x][sub] for x in ['SSH', 'SST']]

This will crash because the names of batch_input's dimensions are now nlon_input and nlat_input, but if concat_input_dims=False the dim names stay the same.

cmdupuis3 commented 1 year ago

Just learned that xarray rolling adds "_input" (or something similar) also, and it's used to distinguish between the original dimensions (which may still exist) and the new stencil dims.

I'm thinking that this looks superfluous in xbatcher because (at least in my case) the original dimensions are always stacked. Maybe "_input" makes sense if they aren't stacked?