Closed pawelru closed 2 years ago
Vignettes are fixed https://github.com/insightsengineering/teal.data/issues/65
I'm moving the issue to the backlog as request refers to change in the code
Do we really care about the head and tail when printing - it's just confusing with the line numbers in the tail part...
Could we change:
print(head(as.data.frame(self$get_raw_data())))
if (self$get_nrow() > 6) {
cat("\n...\n")
print(tail(self$get_raw_data()))
}
to something like this?
print(head(as.data.frame(self$get_raw_data())))
if (self$get_nrow() > 6) {
cat("\n...\n")
}
I agree to make this simpler and just print the head.
Is this issue dead?
Two points:
There is an inconsistency here: the raw data (which is a tibble
) is converted to data.frame
for the head
call, while tail
is called directly on the tibble
. Printing a tibble
drops columns to fit neatly in the console, it's the endless wrapping of a data.frame
that causes the output to be (potentially) very long.
The confusing row numbers result from the fact that the raw data is a tibble
and the print
method for tibble
ignores row names and just shows integer indices for the displayed subset. This is not the case for data.frame
:
> iris[50:51, ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
50 5 3.3 1.4 0.2 setosa
51 7 3.2 4.7 1.4 versicolor
tibble::tibble(iris)[50:51, ]
A tibble: 2 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5 3.3 1.4 0.2 setosa 2 7 3.2 4.7 1.4 versicolor ```
I presume the intention was to obtain something like data.table
does:
> data.table::as.data.table(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1: 5.1 3.5 1.4 0.2 setosa
2: 4.9 3.0 1.4 0.2 setosa
3: 4.7 3.2 1.3 0.2 setosa
4: 4.6 3.1 1.5 0.2 setosa
5: 5.0 3.6 1.4 0.2 setosa
---
146: 6.7 3.0 5.2 2.3 virginica
147: 6.3 2.5 5.0 1.9 virginica
148: 6.5 3.0 5.2 2.0 virginica
149: 6.2 3.4 5.4 2.3 virginica
150: 5.9 3.0 5.1 1.8 virginica
Here is how this happens:
toprint = rbind(head(toprint, topn + isTRUE(class)),
`---` = "", tail(toprint, topn))
rownames(toprint) = format(rownames(toprint), justify = "right")
Note the actual row names are dropped in data table. The numbers above are also ad-hoc indices but they refer to observations in the data set, rather than in the print output like for tibbles.
Also,
print(head(as.data.frame(self$get_raw_data())))
if (self$get_nrow() > 6) {
cat("\n...\n")
print(tail(self$get_raw_data()))
}
may result in rows being printed twice if there are less than 12.
So I think we should do https://github.com/insightsengineering/teal.data/issues/68#issuecomment-1227456012
https://github.com/insightsengineering/teal.data/blob/4588016464044cfca5cbd8df75cd525c035cdc28/R/TealDataset.R#L126-L130
Above is printing twice in most of the cases. As an example - please open pkgdown documentation for
cdisc_dataset
- in the examples section you would see two print outputs.This make some of the documentation (such as vignettes on data level with multiple datasets) super-super long.
Please also check other print methods in the child classes - we might make this mistake there as well.