podaac / concise

CONCISE (CONCatenatIon SErvice)
https://podaac.github.io/concise
Apache License 2.0
7 stars 4 forks source link

ensure the created dimension is sorted #96

Closed danielfromearth closed 4 months ago

danielfromearth commented 9 months ago

Challenge

As far as I can tell, concise does not currently sort — for example, according to the filename — the concatenated datasets.

Desired functionality

When concatenating data granules, data arrays along the new dimension should be ordered according to some meaningful rule, e.g., lexicographic ordering to sort filenames.

Example

This challenge was encountered when testing concise with a data collection for the Tropospheric Emissions Monitoring of POllution (TEMPO) instrument. It appears that the resulting data arrays are randomly indexed along the newly stacked dimension created by concise.

Benefit

If the data arrays are ordered, it will be easier to inspect and perform further analysis with the resulting data files.

frankinspace commented 8 months ago

After discussion we came to agreement that there are 2 viable paths forward to satisfy this feature:

  1. Do not actively sort, but ensure that concise maintains the order of files as given from Harmony
  2. Implement a simple alpha-numeric sort of the file list prior to starting the merge step

The first approach (maintain the order of input) depends on answering the question: is the order of the files provided to concise from Harmony "correct" with respect to the needs for TEMPO data. @danielfromearth or Andrey will take point on answering this question.

If implementing the first approach satisfies the requirements for TEMPO, that is the preferred solution. Otherwise, it is agreed that the second approach should be implemented.

ank1m commented 8 months ago

Findings:

Solution:

jamesfwood commented 4 months ago

Done, to maintain the order of files as given from harmony