Closed jappeace closed 1 year ago
There was a ticket #7 about this As there was no particular motivation it was closed. If you're willing to contribute something like this I'd be glad to review the code.
Working around it now with csv but there is definitely some interest here from the business side to producing xslx files, so I'll be doing this in the future.
Your review and feedback would be appreciated :wink:
I have just run into this problem and will probably now have to convert to the file to CSV and use streaming via cassava.
@jappeace if you made any progress on this which you can share, I could maybe work on it a bit more.
@jappeace I notice you have done a lot of work here: https://github.com/jappeace/xlsx.
What I'd like to be able to do is read a row at a time e.g. to create a sum of a column. Is it possible to do this with your clone?
yes! Supercede is implementing this at the moment, the api is unstable but for reading it sort of works already here: https://github.com/SupercedeTech/xlsx [1]
We may change how the reading api works, (there are two engineers working on this now including me), but we settled on the shape of SheetItem
at least so you can program against that (stuff maybe added). I'm working on writing xlsx as well. The prototype for writing is done, but it may still have some bugs left.
We're intending to upstream the work once we got it to work reliably.
[1] note that's a different repo than mine, I originally implemented it as part of "research day" but now it turns out client actually need it so it got moved to supercede github.
Let me know if you need help, have questions or any issues. It's really important for the company I work for to get this right :slightly_smiling_face:
If you could show me how I would get e.g. rows 5, 6 and 7 from an .xlsx file and also how I can process a table row by row that would be really helpful.
At the moment I am using R to convert a tab in an .xlsx file to a .csv file and then using cassava with something like the following to get rows with indices in is
:
let bar :: [[String]] -> Records [String] -> IO [[String]]
bar acc baz =
case baz of
Cons (Left s) _ -> error s
Cons (Right []) r -> bar acc r
Cons (Right a@(x : _)) r -> if x `elem` (map show is)
then bar (a : acc) r
else bar acc r
Nil _ _ -> return acc
https://hackage.haskell.org/package/cassava-0.5.2.0/docs/Data-Csv-Streaming.html lets you process a row at a time which is what I want to do after I have processed the given rows.
Many thanks
First obtain a producer conduit from your xlsx file, probably you want to use sourceFile. Then put that into readxlsx. Now you have a conduit of sheetItems, which you can filter, and fold.
Mind you to use those combinator you have to combine with .| which makes the output of the left conduit the input to the right one.
It all seems a bit convoluted but it allows you to do lazy io (constant memory).
this is done.
We currently have to deal with massive excel files (100 000 to millions of rows). Reading all this data into memory isn't feasible. 100 000 rows for example takes up 2 gigs of ram as Xslx representation, 1 is over 8 gig (our app crashed trying to do this).
If we could represent the file as a stream it would solve this problem. I'd suggest conduit because it combines with zip as well.