Closed homer3018 closed 4 years ago
If I understand correctly, a minimal reproducible example would be...
# works as expected
data <- read.table(header = FALSE, text = "
a-b-c 124
a2-b2 42")
sunburstR::sunburst(data)
# does not work as expected
data <- read.table(header = FALSE, text = "
a-b-c 124
a2 42")
sunburstR::sunburst(data)
ultimately, I believe this comes down to the difference between these...
data.frame(t(c("a")), stringsAsFactors = FALSE)
#> t.c..a...
#> 1 a
data.frame(t(c("a", "b")), stringsAsFactors = FALSE)
#> X1 X2
#> 1 a b
which probably means the csv_to_hier
function would need to use setNames()
on each iteration of its apply function so that dplyr::bind_rows
doesn't move things around in an attempt to match column names.
Or, that bit could be re-written in a way that avoids the rbind
/dplyr::bind_rows
process, for instance...
# instead of this
dplyr::bind_rows(
lapply(
strsplit(as.character(csv[[1]]), delim),
function(rw) data.frame(t(rw), stringsAsFactors = FALSE)
)
)
#> X1 X2 X3 t.rw.
#> 1 a b c <NA>
#> 2 <NA> <NA> <NA> a2
# this?
read.delim(header = FALSE, sep = delim, text = paste(csv[[1]], collapse = "\n"))
#> V1 V2 V3
#> 1 a b c
#> 2 a2
That is 100% correct. Your minimal example are completely illustrating the issue. Not only a2
is not being drawn but a-b-c
is also altered to a-c
.
Using plyr::ldply
or read.delim
would definitely solve the issue, which ever suits you best.
One thing though, I haven't been able to locate the function csv_to_hier
. I have found its code online. Where would this be ?
csv_to_hier
is an internal function, i.e. it is not intended for direct use, therefore it is not "exported". You can however access it using the sunburstR:::csv_to_hier()
syntax.
Right. It seems that it does not work on my machine :
> sunburstR::csv_to_hier
Error: 'csv_to_hier' is not an exported object from 'namespace:sunburstR'
> sunburstR::csv_to_hier()
Error: 'csv_to_hier' is not an exported object from 'namespace:sunburstR'
You have to use 3 :
's, not 2 to access a non-exported function... :::
instead of ::
My bad. I should have known better! Thanks !
Coming back to the original issue, I've just tested what you suggested ie
read.delim(header = FALSE, sep = delim, text = paste(csv[[1]], collapse = "\n"))
And it does not quite work as expected since read.delim
determines The number of data columns [...] by looking at the first five lines of input (or the whole input if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the ‘Examples’).
In other words this
read.delim(header = FALSE, sep = delim, text = paste(as.character(list(c("a","b","c") %>% paste0(collapse = "-"),
c("a2") %>% paste0(collapse = "-"),
c("a","b","c") %>% paste0(collapse = "-"),
c("a2") %>% paste0(collapse = "-"),
c("a","b","c") %>% paste0(collapse = "-"),
c("a3","b3","c3","d3") %>% paste0(collapse = "-"))), collapse = "\n"))
produces this
V1 V2 V3
1 a b c
2 a2
3 a b c
4 a2
5 a b c
6 a3 b3 c3
7 d3
Using plyr:ldply
is producing the expected result
plyr::ldply(strsplit(as.character(list(c("a","b","c") %>% paste0(collapse = "-"),
c("a2") %>% paste0(collapse = "-"),
c("a","b","c") %>% paste0(collapse = "-"),
c("a2") %>% paste0(collapse = "-"),
c("a","b","c") %>% paste0(collapse = "-"),
c("a3","b3","c3","d3") %>% paste0(collapse = "-"))), delim), rbind)
gives
1 2 3 4
1 a b c <NA>
2 a2 <NA> <NA> <NA>
3 a b c <NA>
4 a2 <NA> <NA> <NA>
5 a b c <NA>
6 a3 b3 c3 d3
good point! Just to be clear, I am not the maintainer of this project, just a former contributor, so I'm merely making suggestions.
I doubt that @timelyportfolio will want to add plyr
as a dependency to this package since it is officially "retired", but that is not my decision to make.
May I make one suggestion to you @homer3018... in your examples you repeatedly use this very complex process of creating a list object...
list(c("a","b","c") %>% paste0(collapse = "-"),
c("a2") %>% paste0(collapse = "-"),
c("a","b","c") %>% paste0(collapse = "-"),
c("a2") %>% paste0(collapse = "-"),
c("a","b","c") %>% paste0(collapse = "-"),
c("a3","b3","c3","d3") %>% paste0(collapse = "-"))
That structure is very difficult to quickly interpret, and it also requires a 3rd party package to be loaded for the %>%
to work. You could replace that with...
list("a-b-c",
"a2",
"a-b-c",
"a2",
"a-b-c",
"a3-b3-c3-d3")
which would then allow the reader to focus on where the actual bug exists rather than trying to error check an unnecessarily complex object creation.
Apologies for that, I copied/pasted things from my actual code in which I have to resort to the pipe and paste commands, I should have simplified it.
I agree that plyr
might not be the right tool to fix this issue, however this is very annoying to have this not working as I currently have to deal with a lot of root objects.
That's mainly why I've not opened a pull request. Surely there is a better way of fixing this.
Thanks for your support so far. I guess it's up to @timelyportfolio now.
Not sure what exactly you're trying to achieve, but if the goal is to be able to pass data to sunburstR
that might have this structure that doesn't process properly, you could write a function to preprocess the incoming data (using plyr
or whatever your preferred method is) before passing to sunburstR
...
fix_data <- function(data, delim = "-") {
node_data <- data[[1]]
node_data <- plyr::ldply(strsplit(as.character(node_data), delim), rbind)
node_data$size = data[[2]]
d3r::d3_nest(node_data, value_cols = "size")
}
data <- read.table(header = FALSE, text = "
a-b-c 1
x 1
d-e-f 1
y 1
g-h-i 1
a3-b3-c3-d3 1")
data
#> V1 V2
#> 1 a-b-c 1
#> 2 x 1
#> 3 d-e-f 1
#> 4 y 1
#> 5 g-h-i 1
#> 6 a3-b3-c3-d3 1
# does not work as expected
sunburstR::sunburst(data)
# works as expected
sunburstR::sunburst(fix_data(data))
Well my point exactly. I'm doing precisely what you described where I shouldn't, since csv_to_hier
should take care of that task. I'll do that no worries while this is not fixed, but that is a band aid that shouldn't exist is all I'm saying, hence this issue to get this function fixed..
Ok... I thought because you're "very annoyed" that you hadn't found a suitable workaround. @timelyportfolio will get to this when they get to it, and I'm confident they'll come up with a good solution.
@homer3018 thanks for filing the issue and @cjyetman thanks so much for stepping in with a quick reply. You are correct in that I do not want to add any more dependencies (actually given the time I would like to start removing dependencies). I plan to work through the potential solutions discussed and will report back today.
As a note, the csv_to_hier
was originally provided in JavaScript form and proved unreliable so I converted to R
. I still believe that the flat table structure is better and favor it instead of the csv form. Here is a little discussion of the other structure https://github.com/timelyportfolio/flattree.
@homer3018 @cjyetman I pushed some changes to csv_to_hier
that I believe fixes the issue without adding dependencies. Here is the reproducible code using @cjyetman data. Please test and if all good I will submit to CRAN. Thanks!
csv <- read.table(header = FALSE, text = "
a-b-c 1
x 1
d-e-f 1
y 1
g-h-i 1
a3-b3-c3-d3 1")
sunburst(csvdata = csv)
Works for me! (using the dev version obviously)
All good as well from my perspective, good work 👍 Thanks !
Thanks @cjyetman @homer3018 I will submit to CRAN today.
Thanks again. Please reopen if this does not fully solve.
Let's suppose we have this list
There is no separator found for a2, and csv_to_hier currently reports it in the column
t.rw
due to thelapply
call:gives
that's obviously not right. Using
plyr::ldply
withrbind
would do just fine in such case :gives
Due to this, the following code
any(is.na(df[,1]))
kicks in and make a mess (specifically withidx_notna <- which(!is.na(df[,1]))
thendf[idx_notna,2] <- df[idx_notna,1]
it deletes parts of the hierarchy and does not fix the original issue since we cannot predict the position of the columnt.rw
. Ifcsv
hasn't got any NA, then we shouldn't need such checks.Because of that sund2b works but is not correct.