stcorp / harp

Data harmonization toolset for scientific earth observation data
http://stcorp.github.io/harp/doc/html/index.html
BSD 3-Clause "New" or "Revised" License
55 stars 19 forks source link

Add file_index and file_subindex upon harpmerge #213

Closed StevenCompernolle closed 4 years ago

StevenCompernolle commented 4 years ago

Using harpconvert on individual files, one has traceability to the individual measurements thanks to the harp variable 'index'. Unfortunately, this is lost when using harpmerge on multiple files.

A solution to keep traceability could be the following: Introduce new variables file_index and file_subindex, where file_index counts the files that were used for the harpmerge (of which the identity and order can be traced back via the history attribute), and file_subindex gets the value of 'index' in the original file.

svniemeijer commented 4 years ago

The intended solution for this is to introduce source_product as a variable. But then as a Categorical Variable. This means that if you have multiple measurements from the same input product, you only store the filename once (as part of the enumeration list) and reference it multiple times as an integer.

To implement this we first need to implement #150 to allow dealing with more than 256 input products.

Once that is done, we would have to revise harpmerge to generate this source_product variable.

An open issue is how to deal with index variables, and what it means if you would provide such a merged product as input to harpcollocate (where we then have the source_product that could be used, but potentially no longer the right index values per source product). This might require something like the file_subindex that you mention, but we would have to make sure that everything that users may expect to work with such source_product and subindex as inputs (such as harpcollocate) also actually works, which will require some work.

So this is not something that we intend to implement in the short term, but it is something that we would like to make available at some point.

svniemeijer commented 4 years ago

I believe this is actually a duplicate of #196