Sort by "auto" (ID) on Horizontal Axis / Chart by default / Cube cannot be loaded

ortnever commented 2 months ago

I've noticed three problems with the Red Lists dataset I've just published on INT and PROD

on PROD :

Sorting by ID (Auto) on the horizontal axis doesn't seem to work for the "Species Group" dimension (Shared Dimension). The IDs are 6-digit "Integer" numbers. Do I need to change anything in the Data or in the definition of DataTyp or is this a bug in the application?
The graph configured by default (by first Visualisation) does not show any data(https://www.visualize.admin.ch/en/v/NwtXqO5smwa_). The default value selected in the "Status IUCN" filter should be a value other than "Data Deficient" OR the default Measure Dimension should be "Species Number". I remember that the default filters were always selected to show data. Do you have a solution for that?

on INT :

version 13 (published) of the "ubd003001" cube cannot be loaded.

Visualize environment and version: PROD v4.7.4 ([71758ed] Visualize environment and version: INT v4.7.4 ([71758ed] Browser and version : Edge

ptbrowne commented 2 months ago

Hi @ortnever , I am starting to look into it and it seems like the sorting "by hierarchy" is clashing with your expectation of sorting by identifier.

Here I have removed the sorting by hierarchy, does the X sorting look good for you ?

I am going to look why the graph configured by default does not show any data.

ortnever commented 2 months ago

Hi @ptbrowne : yes, it looks good. Thank you. It's great if you can have a look for the other "problem". It's correct that the filter combination doesn't show any Data in this case (there is no value for Measure Dimension "Share of species evaluated" and Status IUCN "data deficient"). Would it be possible to set the filter "Status IUCN" per default on an other value than "data deficient". PS : I will be on holiday until the 19. August. I won't be able to react to your question or response before.

ptbrowne commented 2 months ago

Concerning the sorting. In my screenshot I had changed directly in the code to check with you if the situation would fit your expectation, but this was a temporary solution. The "Auto" sorting works by having a hardcoded list of attributes that we sort by.

Right now, we in order:

Sort by hierarchy. This means that the parents of something appear first. In the test that we have, it was to accomodate countries and regions, such that Switzerland comes first, then Cantons, then municipalities. This is why you see in your chart that All organisms comes first then the direct children of All organisms, etc..
Sort by position. This is a predicate that is special for sorting, it is for example used by cantons so that zurich comes first, then bern
Sort by identifier.
Sort by label

Here, the solution I would see would be that

We make a change such that "position" comes first. I tried and it does not break any of our unit tests. This way, we ensure that "position" is the predicate that users can use to "force" a sorting position
You add to the dataset the position predicate to each of the values, copying in there the identifier

What do you think ?

ptbrowne commented 2 months ago

The graph selected by default does not show any data.

True. The way we try to find a correct filter is to find at least one observation. Here it seems that even for data deficient, there are observations.

For example this one:

https://environment.ld.admin.ch/foen/ubd003001/5/observation/100000/DD

PREFIX cube: <https://cube.link/>
PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?dimension0 ?observation WHERE {
  <https://environment.ld.admin.ch/foen/ubd003001/5> cube:observationSet/cube:observation ?observation .
  ?observation <https://environment.ld.admin.ch/foen/ubd003001/statuscode> ?dimension0 .
  VALUES ?dimension0 { <https://environment.ld.admin.ch/vocabulary/iucn_conservation_categories/DD> }
}

LIMIT 1

We would have to talk together so that I can understand correctly how we would have to enhance the logic of this to accomodate this usecase. It seems here we would have to improve the logic of the filters to detect that the part predicate has a value undefined. We would have to see the chart is using the part predicate and ignore any observation with part -> Undefined. In any case, the logic that was implemented a long time ago worked for a long time, baring those Undefined cases. I think we should see this as an "enhancement", not a "bug".

Rdataflow commented 2 months ago

@ortnever there can exist hierarchies of distinct shared dimensions (think of canton having identifiers and municipality having identifiers) even in such cases you would like to sort: canton 1 (ZH) with it's municipalities, canton 2 (BE) with it's municipalities, etc. That's why this auto sort order was chosen.

@ortnever @ptbrowne @bprusinowski sorting by auto (1. hierarchy, 2. position, 3. identifier, 4. label) works well - even with this dataset - (you'd only need to change Measure to Species number) see https://www.visualize.admin.ch/en/v/jcPZ24uGciSC?dataSource=Prod

Context: cube.link Specification of Undefined

In Cube Schema, all dimensions are mandatory for a cube. If a value could not be measured, it should be expressed as such.

Thus it's the cube.link spec demanding a (technical) value in cases where there is no (real) value, that explains this very situation.

It's strictly expected to not display any Undefined (=NoValue) on a chart (as there is no measured / defined value eligible to show for this chosen measure dimension)

In case there is a need to display a value for this dimension, this simply means the value should be added in the data...

In the hope this helps to clarify why the situation occurs and why it's intended like this.

@ortnever you may choose another default by changing Data deficients position from 0 to another value... see https://environment.ld.admin.ch/vocabulary/iucn_conservation_categories/DD - my guess: by default it takes the lowest value to start

ortnever commented 1 month ago

Hi @Rdataflow and @ptbrowne , The sort by hierarchy is what I need but it doesn't work in my dataset. I have a hierarchy inside the dimension "Species group" (see screenshot) but the sort by "automatic" does not take the hierarchy into account. I would like the group 1, following by it's sub-groups, the group 2 following by it's subgroups etc... But in my visualization I get Group 1 (Animals), Group 2 (Plants), Group 3 (Lychens..), and then the Sub-groups "Vertebrates", Molluscs etc.. The hierarchy can be viewed in the filter : .

@Rdataflow : thank you for your proposition for the second problem : "you may choose another default by changing Data deficients position from 0 to another value.." I thought about that, but I need to have this order in the bar (below : regionnaly extinct and Data Deficient on the top. And I think I cann only define this order with the ID. Or is there an other solution?

visualize-admin / visualization-tool

Sort by "auto" (ID) on Horizontal Axis / Chart by default / Cube cannot be loaded #1681