优化建议:字典表翻译导致导出大量数据特别慢

testnet0 commented 1 month ago

版本号：

V3.7.0

问题描述：

当使用字典表进行翻译时，如果字典表中有几万条数据，要导出的数据也有几万条，那么就会产生上亿次字典表替换操作，字典表加缓存也是不合适的，可能数据随时会发生变化，调试发现主要耗时都在这里：

    private String replaceSingleValue(String[] replace, String temp) {
        String[] tempArr;
        for (int i = 0; i < replace.length; i++) {
            //update-begin---author:scott   Date:20211220  for：[issues/I4MBB3]@Excel dicText字段的值有下划线时，导入功能不能正确解析---
            //tempArr = replace[i].split("_");
            tempArr = getValueArr(replace[i]);
            if (temp.equals(tempArr[0]) || temp.replace("_", "---").equals(tempArr[0])) {
                //update-begin---author:wangshuai ---date:20220422  for：导入字典替换需要将---替换成_，不然数据库会存--- ------------
                if (tempArr[1].contains("---")) {
                    return tempArr[1].replace("---", "_");
                }
                //update-end---author:wangshuai ---date:20220422  for：导入字典替换需要将---替换成_，不然数据库会存--- --------------
                return tempArr[1];
            }
            //update-end---author:scott   Date:20211220  for：[issues/I4MBB3]@Excel dicText字段的值有下划线时，导入功能不能正确解析---
        }
        return temp;
    }

原有逻辑是按下划线分割字典的key和value，这里替换成HashMap<String,String>之后会大幅提升性能，而且解决了字符串中存在_的问题 AutoPoiDictMapServiceI.java:

/**
 * 描述：查询字典表
 * @author：TestNet
 * @since：2024-09-09
 * @version:1.0
 */
public interface AutoPoiDictMapServiceI {
    /**
     * 方法描述:  查询数据字典优化
     * 作    者： TestNet
     * @param dicTable
     * @param dicCode
     * @param dicText
     * @return 
     * 返回类型： HashMap<key,value>
     */
    public HashMap<String,String> queryDict(String dicTable, String dicCode, String dicText, boolean isKeyValue);

}

AutoPoiDictMapConfig.java:

/**
 * 描述：AutoPoi Excel注解支持字典参数设置
 * 举例： @Excel(name = "性别", width = 15, dicCode = "sex")
 * 1、导出的时候会根据字典配置，把值1,2翻译成：男、女;
 * 2、导入的时候，会把男、女翻译成1,2存进数据库;
 *
 * @Author:TestNet
 * @since：2024-09-09
 * @Version:1.0
 */
@Slf4j
@Service
public class AutoPoiDictMapConfig implements AutoPoiDictMapServiceI {

    @Lazy
    @Resource
    private CommonAPI commonApi;

    /**
     * 通过字典查询easypoi，所需字典文本
     *
     * @return
     * @Author:TestNet
     * @since：2024-09-09
     */
    public HashMap<String, String> queryDict(String dicTable, String dicCode, String dicText, boolean isKeyValue) {
        HashMap<String, String> dictReplaces = new HashMap<>();
        List<DictModel> dictList = null;
        // step.1 如果没有字典表则使用系统字典表
        if (oConvertUtils.isEmpty(dicTable)) {
            dictList = commonApi.queryDictItemsByCode(dicCode);
        } else {
            try {
                dicText = oConvertUtils.getString(dicText, dicCode);
                dictList = commonApi.queryTableDictItemsByCode(dicTable, dicText, dicCode);
            } catch (Exception e) {
                log.error(e.getMessage(), e);
            }
        }

        for (DictModel t : dictList) {
            //update-begin---author:liusq   Date:20230517  for：[issues/4917]excel 导出异常---
            if (t != null && t.getText() != null && t.getValue() != null) {
                if (isKeyValue) {
                    dictReplaces.put(t.getValue(), t.getText());
                } else {
                    dictReplaces.put(t.getText(), t.getValue());
                }
            }
        }
        if (!dictReplaces.isEmpty()) {
            log.debug("---AutoPoi--Get_DB_Dict------{}", dictReplaces);
            return dictReplaces;
        }
        return null;
    }
}

org.jeecgframework.poi.excel.imports.base.ImportBaseService#addEntityToMap

                HashMap<String,String> dictReplace = jeecgDictService.queryDict(excel.dictTable(), excel.dicCode(), excel.dicText(),false);
                 if(dictReplace!=null && !dictReplace.isEmpty()){
                     excelEntity.setReplaceMap(dictReplace);
                 }

org.jeecgframework.poi.excel.export.base.ExportBase#multiReplaceValueByHashMap

    private Object multiReplaceValueByHashMap(HashMap<String,String> replace, String key) {
        if(key.indexOf(",")>0){
            String[] radioVals = key.split(",");
            String[] temp;
            String result = "";
            for(int i =0;i<radioVals.length;i++){
                result = replace.get(radioVals[i]);
            }
            if(result.equals("")){
                result = key;
            }else{
                result = result.substring(0, result.length()-1);
            }
            return result;
        }else{
            return replaceValueByHashMap(replace, key);
        }
    }

org.jeecgframework.poi.excel.imports.CellValueServer#replaceValueHashMap

    private Object replaceValueHashMap(HashMap<String, String> replace, Object result, boolean multiReplace) {
        if (result == null) {
            return "";
        }
        if (replace == null || replace.size() <= 0) {
            return result;
        }
        String temp = String.valueOf(result);
        String backValue = "";
        if (temp.indexOf(",") > 0 && multiReplace) {
            //原值中带有逗号，认为他是多值的
            String multiReplaces[] = temp.split(",");
            for (String str : multiReplaces) {
                backValue = backValue.concat(replaceSingleValueHashMap(replace, str) + ",");
            }
            if (backValue.equals("")) {
                backValue = temp;
            } else {
                backValue = backValue.substring(0, backValue.length() - 1);
            }
        } else {
            backValue = replaceSingleValueHashMap(replace, temp);
        }
        //update-begin-author:liusq date:20210204 for:字典替换失败提示日志
        if (!replace.isEmpty() && backValue.equals(temp)) {
            LOGGER.warn("====================字典替换失败,字典值:{},要转换的导入值:{}====================", replace, temp);
        }
        //update-end-author:liusq date:20210204 for:字典替换失败提示日志
        return backValue;
    }

错误截图：

优化前3万数据，2万字典表，导出超时：优化后导出7秒：

testnet0 commented 1 week ago

从你的优化来看，双方代码差别在于以下几点： 1.返回结果从String[]（key_value）变成了hashmap key:value 2.节省了List 转 String[] 操作 3.后续操作不再需要重新分割key_value，进行值替换。

另外从你的描述中，我发现了几点问题，顺便提一下： 1.如果说数据表记录有2万行，每一行数据如果有3个字典项，那么进行的替换操作应该是6万次，而不是会2万*字典表总记录 2.紧接上面的问题可以回答另外一个问题，如果说这里不加缓存的话，那么数据库将会在导出一次操作就会承受6万次sql访问。我看你的代码中使用了带缓存的函数，所以这个风险就没有暴露出来 3.紧接上面缓存的问题，你认为的风险是字典值是一个可变项，如果缓存在redis中，会存在幻读的可能性。这确实是有可能发生的事，不过相对来说，字典项是一个无限接近于常量的概念，修改的次数也会很少，即使真的发生修改，并且发生在导出时，那这时确实有可能一部分是历史字典，一部分是新字典，但是再重新导出一次就可以解决这个问题，我个人认为这个业务可接受延迟问题。

2、3两项问题合并起来考量的话，我认为jeecg为字典加redis缓存是目前来说最好的处理办法，同时也可以防止数据库的崩溃，相较而言，让业务重试一次的代价就会显得更能接受。

另外经过我个人排查，我认为此处代码的核心问题在List转String[]的开销是主要原因，次要原因则是，分割字符串与hashmap读取差异，两者共同造成了本次问题。

非常感谢你的回复，我说下具体的场景：如果是这种字典是可以加缓存的，确实像你说的变动不是很频繁 @Dict(dicCode = "yn") 但是这种字典表的情况会产生大量字符串分割的问题： @Dict(dictTable = "domain", dicText = "domain", dicCode = "id") 比如我有域名表和IP表，域名有3万数据，IP有2万数据，如果导出的时候想要域名和IP对应，就会产生大量的字符串操作的开销

EightMonth commented 1 week ago

补充一下，我追了一下AutoPOI的源码，AutoPOI的导出逻辑关于字典是这样的： 1，加载实体所有@Excel实体

加载@excel 中的dictCode属性，并保存到导出参数对象中，留置后续的记录进行转换使用，也就是有几个字典项就只会加载几次，而不是每条记录都会再次加载。即使是2万条记录，如果只有三个字典项，也只会加载三次。 3.导出的字典转换是根据@Excel来的，不是根据@Dict注解来的，@Dict注解不会在POI导出生效，只会在controller返回结果时生效，而且字典切面也只会加载一次。

综上所述，个人推断你的问题不是出现在AutoPOI导出上。

jeecgboot / JeecgBoot