ocean-transport / llc-hero-calc

Vorticity-strain histogram calculations on LLC4320 data
MIT License
2 stars 1 forks source link

Redoing hero calc because of boundary masking issue #1

Open cspencerjones opened 4 hours ago

cspencerjones commented 4 hours ago

This issue is really for @TomNicholas . I decided to do this here rather than over on Slack because it probably requires a bit of thinking.

It seems like there are issues with the JPDFs. My current hypothesis is that this is caused by fill_value, which we would expect would be set here

@xgcm.as_grid_ufunc(
    boundary_width={"X": (1, 0), "Y": (1, 0)},
    boundary="fill",
    fill_value=np.nan,
    dask="parallelized",
)

But this fill_value is ignored. The way to make sure it is used is to do: ζ = vort(grid, ds.U, ds.dxC, ds.V, ds.dyC, ds.rAz, axis= 5 * [("Y", "X")], fill_value=np.nan), which @TomNicholas didn't do when he did the most recent version of the calculation. I have not yet written an xgcm issue about this.

The current lack of masking creates some horrible artifacts in the JPDFs. So it seems like I need to redo the hero calc, or abandon the JPDFs altogether. If I were to redo the calculation, I presumably would need to get a non-standard environment on LEAP, to include https://github.com/xgcm/xhistogram/pull/59? And I would need to get permissions to write to a persistent bucket? What else would I need to know (e.g. how many procs should I use)?

Is this all just more trouble than it's worth? Should I give up?

TomNicholas commented 4 hours ago

It seems like there are issues with the JPDFs.

Damn. How annoying, sorry.

But this fill_value is ignored.

That doesn't sound hard to fix though.

presumably would need to get a non-standard environment on LEAP, to include https://github.com/xgcm/xhistogram/pull/59?

Pretty sure you don't need that for this calculation? It's just 3 sets of 1D bins.

And I would need to get permissions to write to a persistent bucket?

@jbusecke or @dhruvbalwada would have to help you with that.

What else would I need to know (e.g. how many procs should I use)?

Not much else. The number of procs should only affect how fast it goes, not whether or not it completes. The important thing is that you give each processor enough memory. The calculation is parallel in time, so you would just get one timestep to work and then scale it up. I would use the latest version of dask, and if that behaves weirdly you could downgrade to the one I used...

Is this all just more trouble than it's worth? Should I give up?

I can't tell you that, I can only help give an idea of how easy / hard it would be to redo (+ help fix the bugs in xGCM).