stcorp / harp

Data harmonization toolset for scientific earth observation data
http://stcorp.github.io/harp/doc/html/index.html
BSD 3-Clause "New" or "Revised" License
55 stars 19 forks source link

Creating new integer scalar variable: type depends on variable size, which gives problem when doing harpmerge #262

Closed StevenCompernolle closed 2 years ago

StevenCompernolle commented 2 years ago

When I import a harp product in python, and add a scalar variable dayofyear, and then export again, its type depends on the size of the variable. If dayofyear is smaller than 128, the type becomes byte. If it is larger, the type becomes int.

This gives a problem when doing a harpmerge. "ERROR: variables don't have the same datatype (dayofyear)\n"

I adapted the valid_min and valid_max attributes, but that did not make a difference. Somehow it should be possible to set the type such that it is not the minimal type. Or otherwise that the type is automatically converted into compatible types when doing a harpmerge.

svniemeijer commented 2 years ago

Going from the python domain to the C domain is expected to require some manual tuning, since in python you have a lot more flexibility in terms of types than what HARP internally supports. The current downcasting to the smallest types for integers is still a good default, I believe.

If this is not what you want, you can perform a HARP operation to fix the data type of a variable to what you want. The HARP export_product function allows you to provide a set of operations to be executed as part of the export. If you use a derive(dayofyear int16) you can explicitly set the variable to the type that you need.

StevenCompernolle commented 2 years ago

Thanks. Importing the product in python, setting the valid_min, valid_max in python and using the derive(dayofyear int16) when exporting as you suggest seems to work to correct the value.

I also tested to convert the type directly with the command-line tool harpconvert -a "derive(dayofyear int16)" That worked (type changed from byte to short), but the valid_min, valid_max were still those corresponding to byte

    short dayofyear ;
        dayofyear:description = "day of year. 1 to 366" ;
        dayofyear:valid_min = -128s ;
        dayofyear:valid_max = 127s ;

Not sure if this is something you want to adapt. It does not look correct. But having variables with the same type but different valid_min, valid_max does not prevent the harpmerge.

svniemeijer commented 2 years ago

I did some tests:

import harp
import numpy as np
import subprocess

def test_export(product):
    print("------")
    print(product)
    harp.export_product(product, "test.nc")
    subprocess.run(["harpdump", "test.nc"])
    imported = harp.import_product("test.nc")
    print(product)

p = harp.Product()
p.foo = harp.Variable(np.short(10))
test_export(p)

p.foo = harp.Variable(10)
test_export(p)

p.foo.valid_min = 1
p.foo.valid_max = 366
test_export(p)

You will see that:

So, in your case you should be able to get proper exports without using a 'derive()' if you pass a numpy scalar value instead of a python integer or if you pass a high enough valid_max attribute.

There is however an issue in HARP in that the print() of the variable should show the same type as what HARP would use for the export. I.e. it should take into account the valid min/max as well. I will create a new issue for this.