qax-os / excelize

Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets
https://xuri.me/excelize
BSD 3-Clause "New" or "Revised" License
18.05k stars 1.71k forks source link

Slower rows iterator on server than local machine #1892

Closed a-yahaha closed 4 months ago

a-yahaha commented 4 months ago

本地测试代码

测试代码如下:

func main() {
    fmt.Println("hello")
    readExcel("/Users/bytedance/Downloads/excel/测试数据100列4500行.xlsx")
}
// 模拟excel读取数据
func readExcel(path string) {
    options := excelize.Options{
       RawCellValue:     false,
       ShortDatePattern: "yyyy-mm-dd",
       LongTimePattern:  "HH:mm:ss",
       LongDatePattern:  "yyyy-mm-dd HH:mm:ss",
       CultureInfo:      excelize.CultureNameZhCN,
    }
    excel, err := excelize.OpenFile(path, options)
    if err != nil {
       fmt.Println("open excel failed", err)
       return
    }
    totalRows, _ := GetRowsCountIterator(excel, "Sheet1")
    if totalRows == 0 {
       fmt.Println("GetRowsCountIterator is empty")
       return
    }
}
func GetRowsCountIterator(excel *excelize.File, sheetName string) (int64, error) {
    // 获取时间才 1.1s
    startTime := time.Now()
    defer func() {
        if err := excel.Close(); err != nil {
            fmt.Printf("excel GetRowsCount file close error:%+v", err)
        }
        fmt.Printf("GetRowsCountIterator cost: %v \n", time.Since(startTime))
    }()
    rows, err := excel.Rows(sheetName)
    if err != nil {
        fmt.Println("GetRowsCountIterator failed", err)
        return 0, err
    }
    var totalRows int64
    for rows.Next() {
        row, err := rows.Columns(excelize.Options{
            RawCellValue: false,
        })
        if err != nil {
            break
        }
        if len(row) > 0 {
            totalRows++
        }
    }
    if err = rows.Close(); err != nil {
        fmt.Printf("excel GetRowsCountIterator rows close  error:%+v", err)
        return 0, err
    }
    fmt.Printf("GetRowsCountIterator total: %d \n", totalRows)
    return totalRows, nil
}

耗时

GetRowsCountIterator cost: 1.156624625s 

服务器上测试

代码如下

// GetRowsCount retrieves the total rows count from the excelize iterator.
func GetRowsCount(f *excelize.File, sheetName string) (int64, error) {
    defer func() {
        if err := f.Close(); err != nil {
            fmt.Printf("excel GetRowsCount file close error:%+v", err)
        }
    }()
    var totalRows int64
    startTime := time.Now()
    rows, err := f.Rows(sheetName)
    fmt.Printf("[costTime]excel GetRowsCount Rows cost: %v", time.Since(startTime))
    if err != nil {
        fmt.Printf("excel GetRowsCount getRows error:%+v", err)
        return 0, err
    }
    startTime = time.Now()
    for rows.Next() {
        row, err := rows.Columns(excelize.Options{
            RawCellValue: false,
        })
        if err != nil {
            break
        }
        if len(row) > 0 {
            totalRows++
        }
    }
    fmt.Printf("[costTime]excel GetRowsCount Rows.next cost: %v", time.Since(startTime))
    startTime = time.Now()
    if err = rows.Close(); err != nil {
        fmt.Printf("excel GetRowsCount rows close  error:%+v", err)
        return 0, err
    }
    fmt.Printf("[costTime]excel GetRowsCount Rows.close cost: %v", time.Since(startTime))
    return totalRows, nil
}

耗时

[costTime]excel GetRowsCount Rows.next cost: 6.809400376s

服务器的配置 8G 16G

问题

初步排查是 rows.Columns在服务端的性能很差,基本耗时都在这里。

有几个疑问

  1. 是否只能通过行迭代器获取excel中sheet的行总数
  2. Columns的性能为什么有这么大的差异,是否有其他优化的方法

测试文件

测试文件: 100列4500行文本

测试数据100列4500行.xlsx

xuri commented 4 months ago

Thanks for your issue. Is your local machine environment different from your server? The performance is also impacted by these factors: OS, Go language version, Excelize library version, hardware such as CPU, memory, storage equipment (HDD or SSD), etc. I don't have enough information to reproduce this issue, but here are some performance figures for your reference: Performance Figures. I'll close this issue, if you have any questions, please let me know to reopen this anytime.