Closed SarahAlidoost closed 2 months ago
hey @SarahAlidoost can you be a bit more specific? is there anything unconsistent on the analysis side?
Unclustered
is a label assigned the cluster_id
of a model when there's no clustering data (cluster_id = -
), while Other
refers to all the clusters (combined together) with cluster rank higher than a threshold (default = 10). These two labels have a very different meaning.
hey @SarahAlidoost can you be a bit more specific? is there anything unconsistent on the analysis side?
only on plotting the results and not running the analysis. There is an inconsistency between the labels used in tables, scatters and box plots.
Unclustered
is a label assigned thecluster_id
of a model when there's no clustering data (cluster_id = -
),
The label Unclustered
is used as a header in the table and in the legend of the scatters whereas the label "-"
is used in the legend of the box plots. In the table, the value of Cluster Rank
is "-"
while the x-axis of box plots shows capri_rank = 1
.
while
Other
refers to all the clusters (combined together) with cluster rank higher than a threshold (default = 10). These two labels have a very different meaning.
This is another example of the inconsistency of labels. The label "Other"
is used in the legend of scatter plots and box plots whereas there is no column "Other"
in the table. In the table, there are columns with headers according to cluster-ranking=11, 13, 14
. Also, there is no "Other"
in the x-axis of box plots but instead, they are all shown as capri_rank=11
(as an example).
Please let me know if it is still unclear.
This is the expected behaviour in the table. We do compute all cluster statistics, but for plotting purposes only show the top10 and everything else thus becomes other (but not in the table)
And a model which does not cluster is indicated in the table as “-“ and this should translate to unclustered in the plots.
This is another example of the inconsistency of labels. The label "Other" is used in the legend of scatter plots and box plots whereas there is no column "Other" in the table.
PS: Thus in the plot, what you call cluster 11 should be “others” - it is plotted correctly, but just a label issue
The dataframes used for creating tables, scatter and box plots have three columns
cluster-id
,cluster-ranking
andcapri_rank
. Here are two examples where there areUnclustered
andOther
groups in the dataframes:The representation of data in plots and tables for these groups is not consistent. For example, a cluster with
Cluster-id = "-"
is called"Unclustered"
in tables and in scatterplots whereas it is"-"
in box plots and shown ascapri_rank=1
in the x-axis of box plot. Another example, a cluster withCluster-id="Other"
is called"Other"
in scatter plots and box plots legends whereas they are shown withcluster-ranking=11, 13, 14
in tables whereas it is shown ascapri_rank=11
in the x-axis of box plot.See more: