Closed StevenCompernolle closed 4 years ago
The intended solution for this is to introduce source_product
as a variable. But then as a Categorical Variable. This means that if you have multiple measurements from the same input product, you only store the filename once (as part of the enumeration list) and reference it multiple times as an integer.
To implement this we first need to implement #150 to allow dealing with more than 256 input products.
Once that is done, we would have to revise harpmerge to generate this source_product
variable.
An open issue is how to deal with index
variables, and what it means if you would provide such a merged product as input to harpcollocate (where we then have the source_product
that could be used, but potentially no longer the right index
values per source product).
This might require something like the file_subindex
that you mention, but we would have to make sure that everything that users may expect to work with such source_product
and subindex
as inputs (such as harpcollocate) also actually works, which will require some work.
So this is not something that we intend to implement in the short term, but it is something that we would like to make available at some point.
I believe this is actually a duplicate of #196
Using harpconvert on individual files, one has traceability to the individual measurements thanks to the harp variable 'index'. Unfortunately, this is lost when using harpmerge on multiple files.
A solution to keep traceability could be the following: Introduce new variables file_index and file_subindex, where file_index counts the files that were used for the harpmerge (of which the identity and order can be traced back via the history attribute), and file_subindex gets the value of 'index' in the original file.