liaochong / myexcel

MyExcel, a new way to operate excel!
https://github.com/liaochong/myexcel/wiki
Apache License 2.0
1.66k stars 325 forks source link

针对大数据Hbase的导出,动态列的问题怎样解决? #418

Open 89333367 opened 8 months ago

89333367 commented 8 months ago

描述 我在做一个Hbase数据的导出,量级在百万级别,要求能自动分Sheet,但是遇到了麻烦是,Hbase的不同行的列也是不同的,Hbase是列式存储,每行的列可能会不一样,比如第一行有ABC三列,第二行有AF两列,所以在导出的时候,遇到了titles的问题。

复现例子

List<String> bt = Arrays.asList("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K");

    List<Map> getRows(int page, int size) {//模拟Hbase不固定列的数据
        List<Map> rows = new ArrayList<>();
        for (int i = 0; i < size; i++) {
            Map<String, Object> m = new HashMap<>();
            long k = 0;
            for (int j = 0; j < page; j++) {
                int t = j;
                if (t > 10) {
                    t = 10;
                }
                String key = bt.get(t);
                String value = page + "_" + key + "_" + (i + 1) + "_" + (++k);
                m.put(key, value);
            }
            rows.add(m);
        }
        return rows;
    }

    @Test
    void t001() throws IOException {
        List<String> titles = new ArrayList<>();

        DefaultStreamExcelBuilder<Map> streamExcelBuilder = DefaultStreamExcelBuilder.of(Map.class);
        streamExcelBuilder.noStyle();
        streamExcelBuilder.capacity(10000);
        streamExcelBuilder.titles(titles);
        streamExcelBuilder.start();

        for (int i = 0; i < 10; i++) {
            List<Map> rows = getRows(i, 10);
            for (Map row : rows) {
                for (Object key : row.keySet()) {
                    if (!titles.contains(key.toString())) {
                        titles.add(key.toString());//将每一行返回的数据修改表头
                    }
                }
            }
            streamExcelBuilder.append(rows);
        }

        Workbook workbook = streamExcelBuilder.build();
        FileExportUtil.export(workbook, new File("d:/tmp/1.xlsx"));
        streamExcelBuilder.close();
    }

期望的结果 期望导出成功,表头正确,表头可以是所有行的列集合去重