stcorp / harp

Data harmonization toolset for scientific earth observation data
http://stcorp.github.io/harp/doc/html/index.html
BSD 3-Clause "New" or "Revised" License
55 stars 18 forks source link

Support arithmetic in operator #249

Open zxdawn opened 2 years ago

zxdawn commented 2 years ago

Sometimes users may want to do some arithmetics and use the result to subset data.

Here's an example:

import harp
from glob import glob

operations = 'latitude >= 65[degree_north]; \
              keep(latitude, longitude, latitude_bounds, longitude_bounds, scene_pressure, surface_pressure)'

product = harp.import_product(glob('../data/tropomi/2021/03/S5P_OFFL_L2__NO2____20210328T1134*.nc'), operations)

product.new_flag = harp.Variable(product.scene_pressure.data - 0.98*product.surface_pressure.data, ["time",])
product.new_flag.unit = 'Pa'

new_product = harp.execute_operations(product, operations='new_flag  > 0[Pa]')

If the operator supports arithmetic, it would be simplified into:

import harp
from glob import glob

operations = 'latitude >= 65[degree_north]; scene_pressure-0.98*surface_pressure >= 0[Pa]\
              keep(latitude, longitude, latitude_bounds, longitude_bounds, scene_pressure, surface_pressure)'

product = harp.import_product(glob('../data/tropomi/2021/03/S5P_OFFL_L2__NO2____20210328T1134*.nc'), operations)
svniemeijer commented 2 years ago

I understand the usefulness of this type of operation, but it is a bit tricky.

First of all, if we would support this, we would split it into separate operations. Performing an arithmetic operations as part of the filtering itself would be very complicated to implement, so we would still have a separate step that would create a new variable (similar to how you create the 'new_flag' variable) and then have a second step that performs the filtering based on that variable.

For instance:

operations = [
    'latitude >= 65[degree_north]',
    'pressure_diff {time} [Pa] := scene_pressure-0.98*surface_pressure',
    'pressure_diff >= 0[Pa]',
    'keep(latitude, longitude, latitude_bounds, longitude_bounds, scene_pressure, surface_pressure)',
]
operations = ';'.join(operations)

The crucial part is that the arithmetic steps need to make sense for this variable creation operation. For instance, we can't allow adding pressures (in [Pa]) to altitudes (in [m]) for instance, so we need some unit checks. But things can become quite complicated quickly. For instance, if a product has velocity ([m/s]), distance ([m]), and duration ([s]) variables. Then velocity-distance would not be allowed, but velocity-distance/duration should be. Also, we need to make sure that dimensions are consistent and provide sensible error messages in case these aren't.

On top of that, the 'normal quantities' can already be calculated using the derive() operation (supporting a wide range of variable conversion). So this arithmetic variable creation feature would only be useful for quantities that are not regular physical quantities (like these arbitrary quality threshold values that you mention in your example).

I think that this feature is something that is doable, but given the amount of effort needed to implement it properly, it currently doesn't have a high priority.