Closed daattali closed 2 years ago
Hi @daattali ,
you requested 4 3 rows and the first one is always used as column name. This is documented here: https://github.com/ycphs/openxlsx/blob/1fa10d17f500863d86984f095440ab92632162ad/R/readWorkbook.R#L12
Changing this default behavior would break a lot of existing code. What you wanted is this:
> df <- read.xlsx("~/gh_issue_304.xlsx",
+ rows = 2:4,
+ cols = 1,
+ colNames = FALSE)
> df
X1
1 1
2 2
3 3
I close this as invalid.
This could be either a bug or a feature request, depending on your view.
Suppose I have an Excel sheet with the following data:
If I want to read rows 2-4, I expect to get a dataframe with 3 rows. But
read.xlsx(file, sheet, rows = 2:4, cols = 1, skipEmptyRows = FALSE)
will return only the first row "b" because the next two rows are empty. The same thing happens if I request rows 2:5, but if I ask for rows 2:6 then I'll correctly get all rows.I consider this a bug because I don't see this behaviour documented in
read.xlsx
, although perhaps it's documented elsewhere.Ideally I'd think the behaviour should change so that you can always know the exact dimensions of the returned dataframe. IMO request 3 rows should always return 3 rows. But in order to not have a breaking change, a new parameter can be added to specify whether or not to ignore trailing missing values.