nacnudus / tidyxl

Read untidy Excel files in R https://nacnudus.github.io/tidyxl/
https://nacnudus.github.io/tidyxl/
Other
248 stars 21 forks source link

Specify range for xlsx_cells() #25

Open SteveBronder opened 6 years ago

SteveBronder commented 6 years ago

I have a large excel file that causes xlsx_cells to crash. It would be nice to say, "Only get this many rows and this many columns" when calling xlsx_cells.

Could something be put in xlsxsheet::parseSheetData?

nacnudus commented 6 years ago

Hi, thanks for the suggestion. If you're thinking of something like readxl's range argument, then I agree, that would be good.

Presumably you've already tried reading one sheet at a time with the sheets argument?

SteveBronder commented 6 years ago

Yes I have. My problem is that the Workbook is 67 MB in size (yes yikes!). Calling an individual sheet still causes xlsx_cells() to crash.

SteveBronder commented 6 years ago

Actually I've found that the person who made these excel files dragged the formatting down to the last possible row over a bunch of columns. So my actual issue is that for each sheet xlsx_cells() is trying to parse a ton of rows that only have formatting. So excel size is not really the issue, but having n_cols and n_rows arguments would be rad in solving this.

In my particular case I only need the first three rows or so.

nacnudus commented 6 years ago

I think a first step is to optionally omit blank cells. When readxl implemented range import it was complicated, and I want to take care to do it as similarly as possible.

SteveBronder commented 6 years ago

That would be a nice solution for my problem!

On Fri, Apr 27, 2018 at 6:33 PM, Duncan Garmonsway <notifications@github.com

wrote:

I think a first step is to optionally omit blank cells. When readxl implemented range import it was complicated https://github.com/tidyverse/readxl/pull/314, and I want to take care to do it as similarly as possible.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nacnudus/tidyxl/issues/25#issuecomment-385110732, or mute the thread https://github.com/notifications/unsubscribe-auth/AFlfz4yiDAhOe65-c-vUWpxBVKKqN33Dks5ts5yggaJpZM4TcW1Y .

nacnudus commented 6 years ago

@SteveBronder blank cells can now be excluded on the master branch.

xlsx_cells(x, include_blank_cells = FALSE)

I'll keep this issue open for the range feature.

SteveBronder commented 6 years ago

Ty so much!!