qrilka / xlsx

Simple and incomplete Excel file parser/writer
MIT License
130 stars 64 forks source link

CellMap containing мany cells #105

Closed Anarchist666 closed 6 years ago

Anarchist666 commented 6 years ago

To create a book with 6000 cells (including text containing) I need to create a map using fromList. At such values, the heap overflows.

qrilka commented 6 years ago

@Anarchist666 do care to create an example? E.g. file from https://github.com/qrilka/xlsx/issues/100#issuecomment-338956099 takes some time to be parsed but it shows no heap overflows with 6000826 cells though with repetitive content

Anarchist666 commented 6 years ago

Not parse exist file. I have list with 6000 cells (1000 rows x 6 columns) and list of indexes: indexes = [ (x, y) | x <- [1, 2 .. ], y <- [1, 2 .. 6]]. Then i create CellMap: cells_map = M.fromDistinctAscList $ zip indexes cell_list. Cells create: def & cellValue ?~ CellText (T.pack value). 8 Gb memory overflows. I tried to use Lazy and Strict Map.

qrilka commented 6 years ago

@Anarchist666 what is in that cell_list and why do you do T.pack for values? E.g. for

import qualified Data.Map.Lazy as ML
import qualified Data.Map.Strict as MS
import Weigh

main =
  mainWith $ do
    func "lazy map 1000" genMapL 1000
    func "lazy map 10000" genMapL 10000
    func "lazy map 100000" genMapL 100000
    func "strict map 1000" genMapS 1000
    func "strict map 10000" genMapS 10000
    func "strict map 100000" genMapS 100000

genMap :: Int -> [((Int,Int), Char)]
genMap n = take n $ zip [(x,y) | x <- [1, 2 .. ], y <- [1, 2 .. 6]] (repeat 'a')

genMapL = ML.fromDistinctAscList . genMap

genMapS = MS.fromDistinctAscList . genMap

I get

$ ./testMap

Case                Allocated  GCs
lazy map 1000         435,528    0
lazy map 10000      4,059,576    7
lazy map 100000    40,299,576   77
strict map 1000       435,528    0
strict map 10000    4,059,576    7
strict map 100000  40,299,576   77

And those are values for allocated RAM, not peak consumption. I.e. maps themself for such small number of elements should not give any noticeable overhead. So I suppose that the main place where you get memory consumed are those Text values you put into cell maps. But as you don't show what's there I can't say much on that topic. BTW I see you are from Kemerovo - feel free to contact me by Jabber/email using at google dot com.

Anarchist666 commented 6 years ago

Really it turned out in Data.Text.pack. Now I'm using strict fields of my data types, and memory allocation is normal.

qrilka commented 6 years ago

Good to hear that it was resolved @Anarchist666 Please feel free to file other issues with xlsx if you find any.