Open micwij opened 1 year ago
Thanks for the heads up. If you could check if the behaviour occurs
Thanks for the heads up. If you could check if the behaviour occurs
- having two columns
Yes this also occurs also with two or more columns (my original data has more than 10 columns). Here is a replacement for the example above, where I added a second column and modified the values slightly.
example <- tribble(~Compound_Name, ~Compound_Class, ~col, ~log2fc, "L-homoserineAA", "AA", 1, 2.93, "cellobioseCH", "CH", 1, 2.09, "D-maltoseCH", "CH", 1, 1.08, "pectinCH", "CH", 1, -3.04, "raffinoseCH", "CH", 1, -2.10, "L-homoserineAA", "AA", 2, -2.10, "cellobioseCH", "CH", 2, -3.04, "D-maltoseCH", "CH", 2, 1.08, "pectinCH", "CH", 2, 2.09, "raffinoseCH", "CH", 2, 2.93)
Upon modifying the values, it seems that the issue might not stem from the clustering after all, so maybe it is related to the names?
- factoring and grouping (rather the other way around)
I think this is what I did in example2 above, or how do you mean it? Indeed in this case the behavior does not occur. When grouping by the variable as character vector and then mutating it into a factor the behavior still occurs. E.g.:
example %>% group_by(Compound_Class) %>% mutate(Compound_Class = as_factor(Compound_Class)) %>% heatmap(.row = Compound_Name, .col = col, .value = log2fc)
Puzzling.. Right now, I don't have the throughput to debug the function. I will put it on the do-to list. If you happen to want to give it a shot, you might be able to fix the bug in a short time and become part of the tidy* family ;)
Thanks for the heads up. If you could check if the behaviour occurs
- having two columns
Yes this also occurs also with two or more columns (my original data has more than 10 columns). Here is a replacement for the example above, where I added a second column and modified the values slightly.
example <- tribble(~Compound_Name, ~Compound_Class, ~col, ~log2fc, "L-homoserineAA", "AA", 1, 2.93, "cellobioseCH", "CH", 1, 2.09, "D-maltoseCH", "CH", 1, 1.08, "pectinCH", "CH", 1, -3.04, "raffinoseCH", "CH", 1, -2.10, "L-homoserineAA", "AA", 2, -2.10, "cellobioseCH", "CH", 2, -3.04, "D-maltoseCH", "CH", 2, 1.08, "pectinCH", "CH", 2, 2.09, "raffinoseCH", "CH", 2, 2.93)
Upon modifying the values, it seems that the issue might not stem from the clustering after all, so maybe it is related to the names?
Small addition: I just removed the "D-" and "L-" from "D-maltoseCH" and "L-homoserineAA" and indeed the behavior does not appear. Hope this info helps in finding the issue.
Of course, those are globally not the most common names, but these are quite common in metabolomics and I could imagine similar names for e.g. cell lines, or strains, so I think this is still worth looking into.
Puzzling.. Right now, I don't have the throughput to debug the function. I will put it on the do-to list. If you happen to want to give it a shot, you might be able to fix the bug in a short time and become part of the tidy* family ;)
Sure. No worries and no hurry! I might try to look into it but I am not sure if I am experienced enough to solve it. I will report it here if I find anything.
I can confirm that the issue can be fixed by converting the variable into a factor. I tried replacing all dots, spaces and dash characters with underscores, thinking that it could somehow be related to that, but this made no difference. But converting to factor works for now.
Can you please send me the list of variables, in their simplest form, where they fail if not transformed into factors? This bit puzzles me a lot.
Try to get them in the simplest form and the smallest number where the error appears, we might be able to identify what is the cause. We need to fix this.
Hello all, thanks for bringing this to our attention. We will have a dedicated person for tidyomics who will also maintain tidyHeatmap.
Hopefully, this will happen soon.
on it..
I can confirm that the issue can be fixed by converting the variable into a factor. I tried replacing all dots, spaces and dash characters with underscores, thinking that it could somehow be related to that, but this made no difference. But converting to factor works for now.
Just to clarify I fixed converting the row names into factor. But I am going to fix the source problem anyway.
I was trying to use tidyHeatmap to make heatmaps of metabolomics data, when I noticed a strange behaviour that rows escaped their manually assigned grouping and ended up in the wrong grouping. It is a bit tricky to explain, so I am providing a small example here:
example <- tribble(~Compound_Name, ~Compound_Class, ~col, ~log2fc, "L-homoserineAA", "AA", 1, 2.93, "cellobioseCH", "CH", 1, 2.09, "D-maltoseCH", "CH", 1, 3.08, "pectinCH", "CH", 1, -3.04, "raffinoseCH", "CH", 1, -2.10)
example %>% group_by(Compound_Class) %>% heatmap(.row = Compound_Name, .col = col, .value = log2fc)
example2 <- example %>% mutate(Compound_Name = as_factor(Compound_Name))
example2 %>% group_by(Compound_Class) %>% heatmap(.row = Compound_Name, .col = col, .value = log2fc)
AA stands for amino acid and CH stands for carbohydrate (this is not important for the understanding of the issue, just to provide some context). I also added the compound class to the end of the compound name.
When the .row variable is just a character vector D-maltoseCH is switched with L-homoserine and both show up in the wrong group (putatively due to the clustering by the value?)
When mutating Compound_Name into a factor they both get correctly assigned:
I don't know if this is an issue of tidyHeatmap or of the underlying ComplexHeatmap package but I think it would be important to find out and fix this behavior. Transforming the .row variable to a factor seems to work but I am not sure whether this is how this vector is most commonly used.
Let me know if something is unclear.
Sorry for this somewhat strange example. I tried to recreate the example with mtcars or diamonds but I wasn't able to achieve this strange behavior.