Open codeman008 opened 4 months ago
以上问题已经解决 主要是paddlepaddle相关的版本不对,重新安装版本之后,能加载数据但是报错:
desc: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 132/132 [00:00<00:00, 415651.75it/s]
desc: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:00<00:00, 402360.56it/s]
label_counter:
Counter({'resume': 44, 'email': 44, 'paper': 44})
Counter({'email': 11, 'paper': 11, 'resume': 11})
{'resume': 0, 'email': 1, 'paper': 2}
训练集:
132
验证集:
33
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'XLMRobertaTokenizer'.
The class this function is called from is 'LayoutXLMTokenizer'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'XLMRobertaTokenizer'.
The class this function is called from is 'LayoutXLMTokenizerFast'.
Traceback (most recent call last):
File "layoutlmv3/tet_train.py", line 311, in
同https://github.com/huggingface/transformers/issues/24612#issuecomment-1618929505, 应该是你的transformers版本不对, SLOW_TO_FAST_CONVERTERS mapping里边连'LayoutXLMTokenizer', 'LayoutLMv3Tokenizer'都没有 使用高一点的版本呢,我尝试了一下4.18.0,4.26.1都没问题
同huggingface/transformers#24612 (comment), 应该是你的transformers版本不对, SLOW_TO_FAST_CONVERTERS mapping里边连'LayoutXLMTokenizer', 'LayoutLMv3Tokenizer'都没有 使用高一点的版本呢,我尝试了一下4.18.0,4.26.1都没问题
感谢大佬的解答,我将transformers版本重置到transformers==4.33.1问题解决,但报了一个其他的错误
不知您之前是否遇到过,对于您提交的项目,数据是不用在做操作对吧
1.“”“数据是不用在做操作对吧”“” ---不需要再处理,直接可以用
from layoutlmv3.model.tokenization_layoutxlm_fast import LayoutXLMTokenizerFast as LayoutLMv3TokenizerFast
换成
from layoutlmv3.model.tokenization_layoutxlm import LayoutXLMTokenizer as LayoutLMv3TokenizerFast
试试呢
感谢大佬的解答,已经成功跑起来了,想请问如果想训练CDLA数据集:https://github.com/buptlihang/CDLA,构造数据集按您提供的代码是否可以?
不是一回事,本项目是NLP的分类任务。CDLA数据集的是版面分析,是CV任务,预处理、模型输出、loss等都不一样
您好首先很感谢您分享的代码,我按照你提供代码运行python layoutlmv3/tet_embedding.py之后进行 python layoutlmv3/tet_train.py进行训练爆出以下错位,不知您之前是否遇到过
C++ Traceback (most recent call last):
0 paddle_infer::Predictor::Predictor(paddle::AnalysisConfig const&) 1 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&)
2 paddle::AnalysisPredictor::Init(std::shared_ptr const&, std::shared_ptr const&)
3 paddle::AnalysisPredictor::PrepareProgram(std::shared_ptr const&)
4 paddle::AnalysisPredictor::OptimizeInferenceProgram()
5 paddle::inference::analysis::Analyzer::RunAnalysis(paddle::inference::analysis::Argument)
6 paddle::inference::analysis::IrAnalysisPass::RunImpl(paddle::inference::analysis::Argument)
7 paddle::inference::analysis::IRPassManager::Apply(std::unique_ptr<paddle::framework::ir::Graph, std::default_delete >)
8 paddle::framework::ir::Pass::Apply(paddle::framework::ir::Graph) const
9 paddle::framework::ir::SelfAttentionFusePass::ApplyImpl(paddle::framework::ir::Graph) const
10 paddle::framework::ir::GraphPatternDetector::operator()(paddle::framework::ir::Graph, std::function<void (std::map<paddle::framework::ir::PDNode, paddle::framework::ir::Node, paddle::framework::ir::GraphPatternDetector::PDNodeCompare, std::allocator<std::pair<paddle::framework::ir::PDNode const, paddle::framework::ir::Node> > > const&, paddle::framework::ir::Graph)>)
Error Message Summary:
FatalError:
Illegal instruction
is detected by the operating system. [TimeInfo: Aborted at 1709625641 (unix time) try "date -d @1709625641" if you are using GNU date ] [SignalInfo: SIGILL (@0x7f4429cefe0a) received by PID 1725796 (TID 0x7f4558065180) from PID 701431306 ]非法指令 (核心已转储)