lqshow / notes

Cheat Sheet
10 stars 2 forks source link

Large csv convert to excel Out Of Memory #23

Closed lqshow closed 6 years ago

lqshow commented 6 years ago

下午碰到个事,csv转excel,数据量不大百万级,毕竟excel支持行也不多,记得是1048576,超过的直接丢掉。

最早先的代码如下,按行操作,没什么大问题。

try(
BufferedReader in = new BufferedReader(new InputStreamReader(fs.open(filePath), "UTF-8"))){
  String line;
  while ((line = in.readLine()) != null) {
  // 转换操作        
  }
}

接着测试数据进来,发现有大量LF存在单元格呢,那么BufferedReader的readLine就不能用了,后调整代码如下,使用了apache commons csv库。

随手用了parser.getRecords()方法,一个转换操作执行了半小时没停下来,运行jmap查看PS Old Generation基本爆满,直到最后终于爆出collection OOM。

jmap -heap pid
CSVFormat format = CSVFormat.DEFAULT.withDelimiter(',').withQuote('"').withEscape('\\');
try (
  CSVParser parser = new CSVParser(in, format)
) {
  // 内存溢出根源
  List<CSVRecord> csvRecords = parser.getRecords();
  for (Iterator iter = csvRecords.iterator(); iter.hasNext(); ) {
    String str = (String) iter.next();
    // 转换操作  
  }
}

最后老老实实用iterator解决

Iterable<CSVRecord> records = CSVFormat.DEFAULT
                    .withDelimiter(',')
                    .withQuote('"')
                    .withEscape('\\').parse(in);
// Write data rows
sw.beginSheet();
int rowNum = 0;
for (CSVRecord record : records) {
    if (rowNum > Constants.EXCEL_ROW_LIMIT - 1) {
        break;
    }
    rowNum++;
    sw.insertRow(rowNum);

    int j = 0;
    for (Iterator iter = record.iterator(); iter.hasNext(); ) {
        String str = (String) iter.next();
        if (j > Constants.EXCEL_COLUMN_LIMIT - 1) {
            continue;
        }
        sw.createCell(j, sw.escapeCharacters(str));
        j++;
    }
    sw.endRow();
}
sw.endSheet();