tealeg / xlsx

Go library for reading and writing XLSX files.
Other
5.85k stars 819 forks source link

Memory leak? #712

Closed c4rnot closed 1 year ago

c4rnot commented 3 years ago

I'm having a strange issue. I'm opening an excel of approx 11MB and reading to a database.

Task manager shows up to around 1.7 GB memory used by the program, which doesn't reduce after the function completes.

gotool pprof though is only showing 50MB of allocation (roughly half of which is ultimately to readSheetsFromZipFile, and half of which to my reflect heavy code which puts it into a slice for an ORM to put into the db.)

I ran with my reflect part commented out, but still have similar 1.7GB OS memory footprint.

opening using wb, err := xlsx.OpenFile(filename, xlsx.UseDiskVCellStore) didn't change this behaviour either.

tealeg commented 3 years ago

I think there's a lot of possibilities here. Firstly I'd like to know what exactly you mean by "OS memory footprint". There are multiple numbers reported by OSs for memory used by a process, some of which reflect shared resources and memory maps for file reading etc.

Ultimately though I think it's fair to say that this library is quite heavy on memory when using non-trivial spreadsheets. Some of that depends on the structure of the sheet itself. Right now I don't have enough information to begin to help you.

c4rnot commented 3 years ago

I'll make a pared down version of my code with just the excel opening and dummy data of similar length. Strangely xlsx.UseDiskVCellStore errors out in my reduced code but not in the original..

c4rnot commented 3 years ago

Hello tealeg,

attached is is the code (including the calling stack from main to the point at which xlsx.OpenFile is called, and including a dummy excel table of similar size (same length, same number of columns and column data types, similar text lengths)

Analyser_dbUpload.zip

Windows Task manager shows the program using 1.25 GB when it reaches the execution poinnt where it requests press a key to continue (i.e. after the workbook has gone out of scope). (I tried setting wb's pointer to nil before exiting the function from which OpenFile is called, but id did not appear to have any effect.

It's definitely not a memory leak in the classic sense: calling xlsx.OpenFile on the file more than 3x results in a stable 1.9GB of memory usage - It doesn't keep increasing.

After one round, major allocations are (see pprof-memusage.svg in the attached) xlsx newCell 427 MB reflect unsafe_NewArray 313 MB

Also, Using xlsx.UseDiskVCellStore on windows results in an error. The directory for the cell store is created, but is left empty. The error message is as follows:

OpenFile: ReadZip: ReadZipReader: readSheetsFromZipFile: create key file: open file: open C:\Users\marti\AppData\Local\Temp\cellstore2b7620f5-61e6-4642-b881-b090c91db932967489786\Tabelle1:000000:000000: The filename, directory name, or volume label syntax is incorrect.\ngoroutine 7 [running]:\nruntime/debug.Stack(0xc00e7f17a0, 0x2f7820, 0xc000182120)\n\tC:/Program Files/Go/src/runtime/debug/stack.go:24 +0xa5\ngithub.com/tealeg/xlsx/v3.readSheetFromFile.func1(0xc00e7f1f38)\n\tC:/Users/marti/go/pkg/mod/github.com/tealeg/xlsx/v3@v3.2.3/lib.go:691 +0x65\npanic(0x2f7820, 0xc000182120)\n\tC:/Program Files/Go/src/runtime/panic.go:965 +0x1c7\ngithub.com/tealeg/xlsx/v3.(DiskVRow).setCurrentCell(0xc00005e550, 0xc000108360)\n\tC:/Users/marti/go/pkg/mod/github.com/tealeg/xlsx/v3@v3.2.3/diskv.go:233 +0xe5\ngithub.com/tealeg/xlsx/v3.(DiskVRow).PushCell(0xc00005e550, 0xc000108360)\n\tC:/Users/marti/go/pkg/mod/github.com/tealeg/xlsx/v3@v3.2.3/diskv.go:245 +0x45\ngithub.com/tealeg/xlsx/v3.(*Row).PushCell(0xc027e6af40, 0xc000108360)\n\tC:/Users/marti/go/pkg/mod/github.com/tealeg/xlsx/v3@v3.2.3/row.go:74 +0x62\ngithub.com/tealeg/xlsx/v3.readRowsFromSheet(0xc000108fc0, 0xc0000b0080, 0xc027db38c0, 0xffffffffffffffff, 0xc027e51560, 0x0, 0x0)\n\tC:/Users/marti/go/pkg/mod/github.com/tealeg/xlsx/v3@v3.2.3/lib.go:549 +0x5db\ngithub.com/tealeg/xlsx/v3.readSheetFromFile(0xc000160d48, 0x8, 0x4e8a08, 0x1, 0xc000160d60, 0x4, 0x0, 0x0, 0xc0000b0080, 0xc0000ab740, ...)\n\tC:/Users/marti/go/pkg/mod/github.com/tealeg/xlsx/v3@v3.2.3/lib.go:715 +0x40f\ngithub.com/tealeg/xlsx/v3.readSheetsFromZipFile.func2(0xc000160d48, 0x8, 0x4e8a08, 0x1, 0xc000160d60, 0x4, 0x0, 0x0, 0xc0000b0080, 0xc0000ab740, ...)\n\tC:/Users/marti/go/pkg/mod/github.com/tealeg/xlsx/v3@v3.2.3/lib.go:787 +0x95\ncreated by github.com/tealeg/xlsx/v3.readSheetsFromZipFile\n\tC:/Users/marti/go/pkg/mod/github.com/tealeg/xlsx/v3@v3.2.3/lib.go:786 +0x692\n\n

I tried creating a file with the same name. The colon is not allowed in file names under windows. I think it should be possible to clear this just by replacing the colon with a dash or an underscore. n.b.can't contain /\:*<>|

image

best regards Martin

qiniuno commented 3 years ago

I met the same problem

github-actions[bot] commented 1 year ago

Stale issue message