talgalili / dendextend

Extending R's Dendrogram Functionality
152 stars 29 forks source link

Warning message when using match_order_by_labels() on manually created hclust objects #117

Closed mooibroekd closed 5 months ago

mooibroekd commented 6 months ago

Describe the bug When combining two manually created hclust objects using match_order_by_labels() produces a warning.

To Reproduce

################################################################################
## Dendrogram a
################################################################################
a <- list()  # initialize empty object
# define merging pattern: 
#    negative numbers are leaves, 
#    positive are merged clusters (defined by row number in $merge)
a$merge <- matrix(c(-1, -2,
                    -3, -4,
                    1,  2), nc=2, byrow=TRUE ) 
a$height <- c(1, 1.5, 3)    # define merge heights
a$order <- 1:4              # order of leaves(trivial if hand-entered)
a$labels <- LETTERS[1:4]    # labels of leaves
class(a) <- "hclust"        # make it an hclust object
plot(a)                     # look at the result   

#convert to a dendrogram object if needed
ad <- as.dendrogram(a) |> 
  dendextend::set("labels_to_char") # make sure the labels are characters


################################################################################
## Dendrogram b
################################################################################
b <- list()  # initialize empty object
# define merging pattern: 
#    negative numbers are leaves, 
#    positive are merged clusters (defined by row number in $merge)
b$merge <- matrix(c(-1, -3,
                    -2, -4,
                    1,  2), nc=2, byrow=TRUE ) 
b$height <- c(1, 1.5, 3)    # define merge heights
b$order <- c(1,3,2,4)              # order of leaves(trivial if hand-entered)
b$labels <- LETTERS[1:4]    # labels of leaves
class(b) <- "hclust"        # make it an hclust object
plot(b)                     # look at the result   


#convert to a dendrogram object if needed
bd <- as.dendrogram(b) |> 
  dendextend::set("labels_to_char")  # make sure the labels are characters

################################################################################
## Combine using match_order_by_labels
################################################################################

# gives a warning message:
ab <- dendextend::dendlist(ad,
                           bd |> 
                             dendextend::match_order_by_labels(ad))
#> Warning in `order.dendrogram<-`(`*tmp*`, value = tree_new_leaf_numbers):
#> 'value' is not integer - coerced using as.integer

# no warning message:
ab <- dendextend::dendlist(ad,
                           bd)

Created on 2024-04-20 with reprex v2.1.0

Expected behavior Understand the reasoning why this is happening and possibly adjust the manual created hclust objects so this will not occur.

talgalili commented 5 months ago

Here is what chatGPT thinks about this issue. @mooibroekd What do you think about the reply?

https://chatgpt.com/share/a2a93282-0203-4ccf-a2c0-0d1a6c6c7e53

The warning message when using match_order_by_labels() on manually created hclust objects in dendextend occurs because of the coercion of values during the order assignment process within the function. Let's break down the problem and how you might address it.

Explanation of the Bug When match_order_by_labels() is used, it internally reorders the dendrogram based on the labels. This reordering process expects the input values to be integers, which is the typical structure of a dendrogram object. However, in your manually created hclust objects, it seems there might be a mismatch or some non-integer values involved that cause this warning to appear.

The warning message:

R Copy code Warning in order.dendrogram<-(*tmp*, value = tree_new_leaf_numbers): 'value' is not integer - coerced using as.integer indicates that the function is encountering values that are not integers and is coercing them to integers using as.integer. This typically happens if the labels or order vectors have non-integer values or if there's a discrepancy in the internal structure of the hclust objects.

Steps to Reproduce and Diagnose Check the Structure of the hclust Objects: Ensure that the order and merge components of your hclust objects are correctly specified as integers.

Verify Labels: Ensure that the labels are correctly set as characters, which you already seem to be doing with set("labels_to_char").

Inspect match_order_by_labels() Output: Inspect the output of the match_order_by_labels() function to see how it changes the order and verify if any non-integer values are introduced.

Adjusting the Manually Created hclust Objects Here’s a refined approach to ensure your manually created hclust objects are correctly specified:

R Copy code library(dendextend)

################################################################################

Dendrogram a

################################################################################ a <- list() a$merge <- matrix(c(-1, -2, -3, -4, 1, 2), ncol=2, byrow=TRUE) a$height <- c(1, 1.5, 3) a$order <- 1:4 a$labels <- LETTERS[1:4] class(a) <- "hclust" plot(a)

ad <- as.dendrogram(a) |> dendextend::set("labels_to_char")

################################################################################

Dendrogram b

################################################################################ b <- list() b$merge <- matrix(c(-1, -3, -2, -4, 1, 2), ncol=2, byrow=TRUE) b$height <- c(1, 1.5, 3) b$order <- c(1, 3, 2, 4) b$labels <- LETTERS[1:4] class(b) <- "hclust" plot(b)

bd <- as.dendrogram(b) |> dendextend::set("labels_to_char")

################################################################################

Combine using match_order_by_labels

################################################################################

ab <- dendextend::dendlist(ad, bd |> dendextend::match_order_by_labels(ad)) Verify the order Component Ensure the order components of your dendrograms are integers and correctly reflect the intended order:

R Copy code print(ad$order) print(bd$order) Conclusion The warning arises due to the coercion process within the order.dendrogram function, likely caused by a mismatch or non-integer values in the order or merge components. By ensuring these components are properly specified as integers and labels are correctly set as characters, you can avoid this warning. This involves careful creation and verification of the hclust objects before applying match_order_by_labels().