ocean-eddy-cpt / gcm-filters-paper

Manuscript on spatial filtering method
1 stars 0 forks source link

Eddy Momentum Example on MOM5 data #18

Closed LaureZanna closed 3 years ago

LaureZanna commented 3 years ago

As per my discussions with @iangrooms , @ElizabethYankovsky and @arthurBarthe , we will contribute the following

arthurBarthe commented 3 years ago

@iangrooms Ok great, thanks! About integral conservation, you mean area-weighted integral I guess. Is CARTESIAN_WITH_LAND conserving that in its current version? As we do not pass any information about the areas to the filter. Sorry for being confused, the rest makes sense to me though :)

@NoraLoose Scipy's ndimage's gaussian filter has an explicit treatment of the domain boundaries (I think the most similar would be to use "wrap" as the mode parameter, rather than the default which is "reflect", https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.gaussian_filter.html). As for continental boundaries, it's kind of implicit I think: if you leave them as nans, those will contaminate filtered values that use that value in the filtering operation (depending on the sigma parameter and truncate parameter). Right now I replace them with zeroes instead. And yes sure I can do tests with both filters.

NoraLoose commented 3 years ago

Is CARTESIAN_WITH_LAND conserving that in its current version? As we do not pass any information about the areas to the filter.

Yes, fixed factor filtering, e.g., with CARTESIAN_WITH_LAND Laplacian, preserves the area integral if you follow these three steps (which the user is meant to):

I am showing this in the last part of this notebook on readthedocs.

arthurBarthe commented 3 years ago

Ah okay, I'll do that for both CARTESIAN_WITH_LAND and scipy's filter function then, thanks!

iangrooms commented 3 years ago

@iangrooms Ok great, thanks! About integral conservation, you mean area-weighted integral I guess. Is CARTESIAN_WITH_LAND conserving that in its current version? As we do not pass any information about the areas to the filter. Sorry for being confused, the rest makes sense to me though :)

No problem! Yes, I was referring to area-weighted integral. If you just sum up the values it's not actually an approximation of the area integral (at least not a good one) because the cells have different sizes.

arthurBarthe commented 3 years ago

Ok here's a first version of the figure. I'm working with the same area as for the vorticity plots. I'm flitering the zonal velocities. The shaded areas correspond to 1.96 stds each way from the mean. The stars are the timings using the default n_steps. The first row of labels if for CARTESIAN and CARTESIAN_WITH_LAND, the second row is for scipy (truncate parameter, where we truncate the sum in number of stds). By the way the computation time of CARTESIAN_BY_LAND was worse than in this figure by a factor 2 due to the wet mask being by default in float64 while the data is in float32.

As an afterthought nobody would use 32 stds for the truncate parameter of scipy's gaussian filter function, even 16 seems like an overkill. I'll probably change those values.

This is for scale=16. Might be worth adding other values. Oh and right now I run this on my laptop, I'll run it on a proper node. image

iangrooms commented 3 years ago

Looks good to me! I'll wait for the updated version before adding it to the paper.

arthurBarthe commented 3 years ago

@iangrooms Here's the figure after running it on one of NYU's cluster nodes. I've set the y axis to be on a log scale.

figure_gcm_filters2

iangrooms commented 3 years ago

Excellent, thanks! Can you send me some details about the node? E.g. on my node I have two Xeon E5-2680 v3 processors.

iangrooms commented 3 years ago

PS, if you can't find it on the NYU site you can just log in to the node and cat /proc/cpuinfo will contain the info. I won't be surprised if you already know this, but just in case.

arthurBarthe commented 3 years ago

@iangrooms Sure: Intel Xeon Platinum 8268 24C 205W 2.9GHz Processor. I wasn't able to run this separately (i.e. other users might have been using the node). However, I used time.process_time() from the time library which should at least partly account for that I think. Here's what the documentation for process_time says: " Return the value (in fractional seconds) of the sum of the system and user CPU time of the current process. It does not include time elapsed during sleep. It is process-wide by definition. The reference point of the returned value is undefined, so that only the difference between the results of two calls is valid. "

I'll also put the code I used in this repo.