Closed aswjh closed 1 year ago
Thanks for your issue. Could you provide a reproducible demo and your input file attachment without confidential info? This issue was similar to #1096. Please decrease the value of UnzipXMLSizeLimit
to avoid high memory usage.
发现用wps表格保存过的大文件也存在同样的问题。这是测试代码:
package main
import (
"fmt"
"flag"
"time"
"math/rand"
"github.com/qax-os/excelize"
)
//go build t.go && ./t -gen //生成一个测试文件/tmp/gen.xlsx
//./t -name=/tmp/gen.xlsx //读取上面生成的文件,内存占用正常。若用wps表格打开gen.xlsx再保存,文件体积增大,再进行读测试,内存占用就会剧增。
var (
genXlsx = flag.Bool("gen", false, "genXlsx")
name = flag.String("name", "", "filename")
letterRunes = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
)
//
func testRead(fn string) {
if xl, err := excelize.OpenFile(fn, excelize.Options {UnzipXMLSizeLimit: 1e18}); err == nil {
for _, sheetName := range xl.GetSheetList() {
cnt := 0
if rows, err2 := xl.Rows(sheetName); err2 == nil {
for rows.Next() {
if _, err3 := rows.Columns(); err3 == nil {
cnt += 1
}
}
}
fmt.Println(sheetName, cnt)
}
}
}
//
func GenXlsxTest(fn string) {
rand.Seed(time.Now().UnixNano())
xl := excelize.NewFile()
for i := 0; i <= 22; i ++ {
sheetname := fmt.Sprintf("测试表%v", i)
xl.NewSheet(sheetname)
streamWriter, _ := xl.NewStreamWriter(sheetname)
header := []interface{} {"序号", "名", "名2", "cc", "ccci", "zzz", "ccb", "hjjj", "jfjfj"}
cell, _ := excelize.CoordinatesToCellName(1, 1)
streamWriter.SetRow(cell, header)
num := 125
switch i {
case 3: num = 150000
case 7: num = 450000
}
for rowi := 2; rowi <= num; rowi ++ {
row := []interface{} {rowi-1, RandStr(5), RandStr(6), RandStr(7), RandStr(7), RandStr(7), RandStr(7), RandStr(8), RandStr(8), RandStr(8), RandStr(8), RandStr(9), RandStr(9), RandStr(9), RandStr(9), RandStr(10), RandStr(10), RandStr(10), RandStr(10)}
cell, _ := excelize.CoordinatesToCellName(1, rowi)
if err := streamWriter.SetRow(cell, row); err != nil {
fmt.Println(rowi, err)
}
}
if err := streamWriter.Flush(); err != nil {
println(err)
}
}
xl.SaveAs(fn)
println("保存到", fn)
}
func RandStr(n int) string {
b := make([]rune, n)
for i := range b {
b[i] = letterRunes[rand.Intn(len(letterRunes))]
}
return string(b)
}
func main() {
flag.Parse()
if *genXlsx {
GenXlsxTest("/tmp/gen.xlsx")
} else if *name != "" {
testRead(*name)
}
}
Thanks for your feedback. As the notice in the above reply, please decrease the value of UnzipXMLSizeLimit
to avoid high memory usage. I have tested with following code with the workbook generated with your code. The RSS memory usage was about 254MB, so the performance was in expectation.
package main
import (
"fmt"
"runtime"
"syscall"
"time"
"github.com/xuri/excelize/v2"
)
func main() {
runtime.GC()
startTime := time.Now()
f, err := excelize.OpenFile("gen.xlsx")
if err != nil {
fmt.Println(err)
return
}
defer func() {
if err := f.Close(); err != nil {
fmt.Println(err)
}
}()
for _, sheetName := range f.GetSheetList() {
cnt := 0
if rows, err2 := f.Rows(sheetName); err2 == nil {
for rows.Next() {
if _, err3 := rows.Columns(); err3 == nil {
cnt += 1
}
}
}
fmt.Println(sheetName, cnt)
}
printBenchmarkInfo("main", startTime)
}
func printBenchmarkInfo(fn string, startTime time.Time) {
var memStats runtime.MemStats
var rusage syscall.Rusage
var bToMb = func(b uint64) uint64 {
return b / 1024 / 1024
}
runtime.ReadMemStats(&memStats)
syscall.Getrusage(syscall.RUSAGE_SELF, &rusage)
fmt.Printf("Func: %s \tRSS = %v MB\tAlloc = %v MB\tTotalAlloc = %v MB\tSys = %v MB\tNumGC = %v \tCost = %s\n",
fn, bToMb(uint64(rusage.Maxrss)), bToMb(memStats.Alloc), bToMb(memStats.TotalAlloc), bToMb(memStats.Sys), memStats.NumGC, time.Since(startTime))
}
Benchmark info:
Func: main RSS = 254 MB Alloc = 6 MB TotalAlloc = 16921 MB Sys = 273 MB NumGC = 4383 Cost = 57.174368275s
需要用wps表格打开第一步生成的gen.xlsx,测试wps et保存后的文件,内存占用会增加很多。
UnzipXMLSizeLimit设得比较大,一方面是想测试不解压到硬盘的内存占用,另外我的系统/tmp是挂载到ram的,只分配了6G,如果解压到/tmp空间不够
Thanks for your feedback. The stream writer writes string cell value as an inline string (please also reference #1377), which storage cell value in the worksheet instead of SST (shared string table) for get better read performance with row iterator, but the workbook resaved after Kingsoft WPS, the cell value will be storage to the SST, so the library needs to parse SST internal parts of the workbook, that's will take more memory usage. The library can read, parse, and validate many internal workbook structures to get better capability, more sure the generated workbook is not corrupted, and more features and support, that will use more memory. I suggest decreasing the value of the UnzipXMLSizeLimit
options for opening the workbook with the amount of data or increase your disk storage or memory resource to get better performance. I've closed this issue, and if you have any questions, please let me know and reopen this anytime.
debian 12系统,excelize用的7.15号go get -u的版本,打开一个180M的xlsx(包200个sheet,其中2个sheet内容最多),设置比较大的UnzipXMLSizeLimit不解压到硬盘,在内存里处理,一开始内存占用近2G,逐渐增加到6G,持续一段时间后完成,耗时3分多钟。
主要代码如下:
尝试用rust的calamine库比较了一下,一开始占用50M,逐渐增加到1.3G就是最高峰了,比wps占用还少点,处理时间也是3分多钟,和go差别不大。主要代码如下:
用wps office表格打开占用1.9G. 感觉excelize内存占用偏大,不知道有没有改进的空间