new technique: Hierarchical indexing

szafarnia commented 2 years ago

Dear all,

Could you please add "Hierarchical indexing" as a new technique to KG?

Many thanks in advance.

Best, Sara

UlrikeS91 commented 2 years ago

Hi @szafarnia, This unfortunately fell through the cracks. Could you elaborate on the technique? I found this when searching for the term: "Hierarchical indexing is a method of creating structured group relationships in data."

@tgbugs and @lzehl What do you think? Should this be discussed or can we add this as a technique?

szafarnia commented 2 years ago

Hi @UlrikeS91, I found this in a data descriptor under the method section (please see below copy of the information that I found in the data descriptor), however, I am not sure if it can be counted as a technique!

Cluster validity criteria Assuming that parcellations at a coarse scale (into 2-3 subregions) should represent the more stable primary patterns such as a rostro-caudal organization, we first considered these before moving to finer parcellations with a particular focus on those close to the granularity of the solution obtained for the right side, i.e., k = 4 – 6. Importantly, however, at these finer scales, only some solutions should be expected to represent stable and hence supposedly meaningful subdivisions, necessitating an objective choice of the solution most supported by the data (Eickhoff et al., 2015). Here we employed four different cluster-validity metrics employed individually to all three modalities. In line with our parcellation of the right PMd VOI (Genon et al, 2016), we examined percentage of deviants and silhouette value. Of note, variation of information across filter sizes, which was investigated in our MACM-CBP of the right PMd VOI is a MACM-CBP specific metric, therefore it was not used in the current multimodal procedure. Rather, in the current multimodal CBP study, we additionally examined hierarchy index and change in inter/intra cluster distance (Clos et al., 2013). Thus, we examined four different criteria: a topological criterion (hierarchy index), a consistency criterion (percentage of deviants) and two cluster separation criteria (change in inter/intra cluster distance and silhouette value).

Hierarchy index: The topological criterion was the percentage of voxels  not related to the dominant parent cluster compared to the previous (k – 1) solution, i.e., the hierarchy-index (Kahnt  et al., 2012). It corresponds to the percentage of lost voxels when only voxels consistent across the entire hierarchy are considered for the final clustering. For example, voxels assigned to cluster X in the 4-cluster solution that were assigned to cluster A (at k=3) would be excluded if the majority of cluster X voxels actually stemmed from cluster B (at k=3). A large fraction of such voxels indicates a hierarchically unstable solution (Clos et al., 2013).  

Percentage of deviants: The percentage of deviants or “misclassified voxels”, i.e. the average percentage of voxels for each filter size/subject that were assigned to a different cluster compared to the most frequent (mode) assignment of these voxels across filter sizes/subjects, was used as a consistency criterion. A significant difference in percentage of deviants between a given cluster solution and the previous (k-1) one was tested using a two-sample t-test. Optimal solutions are those k parcellations where the percentage of deviants (presumably reflecting noise and local variance) is not significantly increased compared to the previous (k-1) solution, while the subsequent (k+1) solution leads to a significantly higher percentage of deviants.

Change in inter/intra cluster distance: The inter/intra cluster ratio (Chang et al. 2012), that is, the ratio between the average distance of a voxel to its cluster centre and the average distance between the cluster centers, was used as cluster separation criterion. Since the higher the distance ratio, the better is the separation, a significant increased ratio compared to the previous k-1 solution would indicate a better separation of the obtained clusters. However, because of the monotonous increase usually observed with this ratio, we used the first derivative to evaluate the change in this ratio across solutions. A local optimum is reached when there is a significant increase in the change from the previous k-1 to the current k solution while the subsequent k+1 solution does not show a significantly larger increase.

Silhouette value. The silhouette value ranges from -1 to 1 and assesses, for each voxel, how similar the voxel is to others within the same cluster, versus, how similar this voxel is to voxels in other clusters regarding connectivity profile. A significant difference in the silhouette value between a given cluster solution and the previous one was tested with a two-sample t-test. Cluster solutions were considered favorable if they show a significantly higher silhouette value, as compared to the previous (k-1) solution.

lzehl commented 2 years ago

@szafarnia & @UlrikeS91 & @tgbugs in principle this is a cluster validation technique, correct? And @szafarnia it is not the only one used in this study.

I think it makes more sense to introduce a general technique (e.g., "cluster validation analysis" or simply "cluster validation") and then register all criterias they looked at as keywords ("hierarchy index", "percentage of deviants", "inter/intra cluster ratio", "silhouette value").

@tgbugs would you agree?

szafarnia commented 2 years ago

@lzehl Thank you for your feedback and suggestion.

UlrikeS91 commented 3 months ago

Note: If we add a technique "cluster validation analysis", it should be under "analysisTechnique"

lzehl commented 3 months ago

@UlrikeS91 correct, "cluster validation analysis" should be under schema type AnalysisTechnique

@tgbugs would you agree registering this analysis technique (which will be a super class, since there are multiple actual analysis technique of actually doing this validation)

openMetadataInitiative / openMINDS_instances

new technique: Hierarchical indexing #96