yangheng95 / PyABSA

Sentiment Analysis, Text Classification, Text Augmentation, Text Adversarial defense, etc.;
https://pyabsa.readthedocs.io
MIT License
909 stars 153 forks source link

上传到google drive的中文训练的模型ATEPC缺少模型文件 #26

Closed LangDaoAI closed 3 years ago

LangDaoAI commented 3 years ago

上传到google drive的中文训练的模型ATEPC缺少模型文件, 下载后只有配置文件, V0.6.1-beta, 我这边GPU资源匮乏, CPU训练又报OOM, 还请帮忙训练一下, 然后我立刻大规模测试,感谢!

LangDaoAI commented 3 years ago

image

LangDaoAI commented 3 years ago

缺少模型文件

yangheng95 commented 3 years ago

收到,最近网络不好,上传经常会出问题

LangDaoAI commented 3 years ago

收到,最近网络不好,上传经常会出问题

嗯嗯, 还有一个问题我自己在修改,就是现在Bert中文模型以及词汇都是远程下载的, 能不能你这边看一下离线下载后Bert加载的代码指向本地目录, 我也修改了,但是最好你这边统一fix一下, 远程下载实在受不了

yangheng95 commented 3 years ago

请暂时在百度网盘下载模型,我这边drive经常连不上

LangDaoAI commented 3 years ago

image

比如atepc训练器

LangDaoAI commented 3 years ago

请暂时在百度网盘下载模型,我这边drive经常连不上

我来看下

yangheng95 commented 3 years ago

pretrained_bert_name这个参数可以指向本地bert目录的

LangDaoAI commented 3 years ago

pretrained_bert_name这个参数可以指向本地bert目录的

是的,我自己修改了, 最好是能够代码加一个配置, bert模型缓存指向, 然后可以随时配置切换

yangheng95 commented 3 years ago

我可能没有理解这个需求,如果可以的话,您可以clone最新版本提交PR,谢谢!

LangDaoAI commented 3 years ago

我可能没有理解这个需求,如果可以的话,您可以clone最新版本提交PR,谢谢!

好的,我有时间就来提交一下PR

LangDaoAI commented 3 years ago

请暂时在百度网盘下载模型,我这边drive经常连不上

公司百度网盘上不了。。。,出口代理反而可以连上google

yangheng95 commented 3 years ago

好的,我尽快上传到google drive。我刚刚修复了一个自动确定数据集极性维度的bug,请更新和clone最新的版本

yangheng95 commented 3 years ago

预计十分钟内会上传完毕

yangheng95 commented 3 years ago

请暂时在百度网盘下载模型,我这边drive经常连不上

公司百度网盘上不了。。。,出口代理反而可以连上google

已经上传完毕

LangDaoAI commented 3 years ago

好的,我尽快上传到google drive。我刚刚修复了一个自动确定数据集极性维度的bug,请更新和clone最新的版本

OK, 我正好有一个问题要问一下, 当我运行extract_aspects_chinese.py 时, 问一下如下两个问题:

1 atepc\inferring\AspectExtractor.py 中为啥还需要中文Bert模型, 中文Bert模型不是在训练LCF模型时候需要吗, 推断时候我应该只需要LCF模型就可以了吧?

2 atepc\inferring\AspectExtractor.py 中的BertModel.from_pretrained()参数中文bert模型名从哪里传入的, 我没有看到最新代码中指定中文Bert名的配置文件

以上请帮解答一下,谢谢!

LangDaoAI commented 3 years ago

请暂时在百度网盘下载模型,我这边drive经常连不上

公司百度网盘上不了。。。,出口代理反而可以连上google

已经上传完毕

OK

yangheng95 commented 3 years ago
  1. 我没有尝试过绕过BERT初始化直接load state_dict,我尝试一下,如果可以的话会优化代码
  2. 读取的来自训练时的配置
LangDaoAI commented 3 years ago
  1. 我没有尝试过绕过BERT初始化直接load state_dict,我尝试一下,如果可以的话会优化代码 我也可以尝试测试这个

  2. 读取的来自训练时的配置

我看错了,对的对的,就是下载的config文件

LangDaoAI commented 3 years ago
  1. 我没有尝试过绕过BERT初始化直接load state_dict,我尝试一下,如果可以的话会优化代码 我也可以尝试测试这个

  2. 读取的来自训练时的配置

我看错了,对的对的,就是下载的config文件

另外,这个下载的模型的config文件我用了各种编码打开都是乱码,正常编辑器不能正常显示? 但这个文件又不是二进制文件,比较奇怪

yangheng95 commented 3 years ago
  1. 我没有尝试过绕过BERT初始化直接load state_dict,我尝试一下,如果可以的话会优化代码 我也可以尝试测试这个

  2. 读取的来自训练时的配置

我看错了,对的对的,就是下载的config文件

另外,这个下载的模型的config文件我用了各种编码打开都是乱码,正常编辑器不能正常显示? 但这个文件又不是二进制文件,比较奇怪

存储的是pickle对象

yangheng95 commented 3 years ago

读取完之后是一个namespace

LangDaoAI commented 3 years ago

读取完之后是一个namespace

好的明白了

LangDaoAI commented 3 years ago

这个问题我先关闭

yangheng95 commented 3 years ago

我更改了模型的保存方式,现在支持直接load模型的时候不再先实例化BERT,但需要使用提供的新版的模型。同时旧版的模型仍然可用,不过需要先实例化BERT再读取模型,这是由于之前保存的是参数没有保存整个模型,需要先实例化模型再load参数

LangDaoAI commented 3 years ago

那么新版模型你这边要重新训练再上传google吧

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: YangHeng @.> Sent: Thursday, June 3, 2021 7:11:56 PM To: yangheng95/pyabsa @.> Cc: LangDaoAI @.>; State change @.> Subject: Re: [yangheng95/pyabsa] 上传到google drive的中文训练的模型ATEPC缺少模型文件 (#26)

我更改了模型的保存方式,现在支持直接load模型的时候不再先实例化BERT,但需要使用提供的新版的模型。同时旧版的模型仍然可用,不过需要先实例化BERT再读取模型,这是由于之前保存的是参数没有保存整个模型,需要先实例化模型再load参数

― You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/yangheng95/pyabsa/issues/26#issuecomment-853789216, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT7ZPWLDJJI6NV7ISRMVGQTTQ5PPZANCNFSM46AMMAHQ.

yangheng95 commented 3 years ago

是的,算力有限,慢慢来,先用旧版的吧

LangDaoAI commented 3 years ago

我也是这样想的,正在下载中,也是有点慢,明天肯定可以测试起来

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: YangHeng @.> Sent: Thursday, June 3, 2021 7:25:49 PM To: yangheng95/pyabsa @.> Cc: LangDaoAI @.>; State change @.> Subject: Re: [yangheng95/pyabsa] 上传到google drive的中文训练的模型ATEPC缺少模型文件 (#26)

是的,算力有限,慢慢来,先用旧版的吧

― You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/yangheng95/pyabsa/issues/26#issuecomment-853796648, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT7ZPWPBKIX2AEJVOEDKFLLTQ5RD3ANCNFSM46AMMAHQ.

LangDaoAI commented 3 years ago

还有一个后续计划,我这边生产带标注的数据集,如果要适配的话,最好加入进来重训练吧,要注意哪些事项

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Tang Yong @.> Sent: Thursday, June 3, 2021 7:26:59 PM To: yangheng95/pyabsa @.>; yangheng95/pyabsa @.> Cc: State change @.> Subject: Re: [yangheng95/pyabsa] 上传到google drive的中文训练的模型ATEPC缺少模型文件 (#26)

我也是这样想的,正在下载中,也是有点慢,明天肯定可以测试起来

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: YangHeng @.> Sent: Thursday, June 3, 2021 7:25:49 PM To: yangheng95/pyabsa @.> Cc: LangDaoAI @.>; State change @.> Subject: Re: [yangheng95/pyabsa] 上传到google drive的中文训练的模型ATEPC缺少模型文件 (#26)

是的,算力有限,慢慢来,先用旧版的吧

― You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/yangheng95/pyabsa/issues/26#issuecomment-853796648, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT7ZPWPBKIX2AEJVOEDKFLLTQ5RD3ANCNFSM46AMMAHQ.

yangheng95 commented 3 years ago

没有什么需要特别注意的,情感标签要连续且不要小于0,代码要根据连续的情感标签初始化模型的输出维度

LangDaoAI commented 3 years ago

其实我想问的是,训练代码中,四个来自不同业态的中文数据集一起喂给模型了,那我线上这个业态数据集的量足够大,还有必要混入这四个业态的数据集还是只focus在我的数据集将会更好的把注意力集中在单个场景

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: YangHeng @.> Sent: Thursday, June 3, 2021 7:31:10 PM To: yangheng95/pyabsa @.> Cc: LangDaoAI @.>; State change @.> Subject: Re: [yangheng95/pyabsa] 上传到google drive的中文训练的模型ATEPC缺少模型文件 (#26)

没有什么需要特别注意的,情感标签要连续且不要小于0,代码要根据连续的情感标签初始化模型的输出维度

― You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/yangheng95/pyabsa/issues/26#issuecomment-853799561, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT7ZPWI6TS4XGMZLYOM4MUTTQ5RX5ANCNFSM46AMMAHQ.

yangheng95 commented 3 years ago

你自己数据够多的话可以无视提供的数据的,毕竟来源不同数据的分布区别很大,说不定反而提供的数据成了数据噪声效果下降

LangDaoAI commented 3 years ago

完全赞同

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: YangHeng @.> Sent: Thursday, June 3, 2021 7:38:27 PM To: yangheng95/pyabsa @.> Cc: LangDaoAI @.>; State change @.> Subject: Re: [yangheng95/pyabsa] 上传到google drive的中文训练的模型ATEPC缺少模型文件 (#26)

你自己数据够多的话可以无视提供的数据的,毕竟来源不同数据的分布区别很大,说不定反而提供的数据成了数据噪声效果下降

― You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/yangheng95/pyabsa/issues/26#issuecomment-853803483, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT7ZPWJH4MGMGFH56EAU5C3TQ5STHANCNFSM46AMMAHQ.

yangheng95 commented 3 years ago

你的数据集是脱敏的吗,如果是的话可以提供给大家学习一下,现有的数据太匮乏了

LangDaoAI commented 3 years ago

还有一个问题,您曾提到spaCy,想知道为啥仍需要它,不太理解

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: YangHeng @.> Sent: Thursday, June 3, 2021 7:38:27 PM To: yangheng95/pyabsa @.> Cc: LangDaoAI @.>; State change @.> Subject: Re: [yangheng95/pyabsa] 上传到google drive的中文训练的模型ATEPC缺少模型文件 (#26)

你自己数据够多的话可以无视提供的数据的,毕竟来源不同数据的分布区别很大,说不定反而提供的数据成了数据噪声效果下降

― You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/yangheng95/pyabsa/issues/26#issuecomment-853803483, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT7ZPWJH4MGMGFH56EAU5C3TQ5STHANCNFSM46AMMAHQ.

yangheng95 commented 3 years ago

还有一个问题,您曾提到spaCy,想知道为啥仍需要它,不太理解 Get Outlook for Androidhttps://aka.ms/AAb9ysg ____ From: YangHeng @.> Sent: Thursday, June 3, 2021 7:38:27 PM To: yangheng95/pyabsa @.> Cc: LangDaoAI @.>; State change @.> Subject: Re: [yangheng95/pyabsa] 上传到google drive的中文训练的模型ATEPC缺少模型文件 (#26) 你自己数据够多的话可以无视提供的数据的,毕竟来源不同数据的分布区别很大,说不定反而提供的数据成了数据噪声效果下降 ― You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub<#26 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT7ZPWJH4MGMGFH56EAU5C3TQ5STHANCNFSM46AMMAHQ.

spacy用来生成语句的语法树,计算局部上下文用的,方法来源于ACL2020的一篇文章,原始的LCF-BERT采用token之间的距离算局部上下文。需要使用spacy的模型只有LCFS-BERT和SLIDE-LCF-BERT,BERT-BASE,BERT-SPC,LCF-BERT,SLIDE-LCF-BERT都不需要。你要用的LCF-ATEPC也用不到spacy。 你可以去看一下论文,论文是《Modelling Context and Syntactical Features for Aspect-based Sentiment Analysis》

LangDaoAI commented 3 years ago

非常感谢!

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: YangHeng @.> Sent: Thursday, June 3, 2021 7:46:03 PM To: yangheng95/pyabsa @.> Cc: LangDaoAI @.>; State change @.> Subject: Re: [yangheng95/pyabsa] 上传到google drive的中文训练的模型ATEPC缺少模型文件 (#26)

还有一个问题,您曾提到spaCy,想知道为啥仍需要它,不太理解 Get Outlook for Androidhttps://aka.ms/AAb9ysg … ____ From: YangHeng @.> Sent: Thursday, June 3, 2021 7:38:27 PM To: yangheng95/pyabsa @.> Cc: LangDaoAI @.>; State change @.> Subject: Re: [yangheng95/pyabsa] 上传到google drive的中文训练的模型ATEPC缺少模型文件 (#26https://github.com/yangheng95/pyabsa/issues/26) 你自己数据够多的话可以无视提供的数据的,毕竟来源不同数据的分布区别很大,说不定反而提供的数据成了数据噪声效果下降 D You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub<#26 (comment)https://github.com/yangheng95/pyabsa/issues/26#issuecomment-853803483>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT7ZPWJH4MGMGFH56EAU5C3TQ5STHANCNFSM46AMMAHQ.

spacy用来生成语句的语法树,计算局部上下文用的,方法来源于ACL2020的一篇文章,原始的LCF-BERT采用token之间的距离算局部上下文。需要使用spacy的模型只有LCFS-BERT和SLIDE-LCF-BERT,BERT-BASE,BERT-SPC,LCF-BERT,SLIDE-LCF-BERT都不需要。你要用的LCF-ATEPC也用不到spacy。 你可以去看一下论文,论文是《Modelling Context and Syntactical Features for Aspect-based Sentiment Analysis》

― You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/yangheng95/pyabsa/issues/26#issuecomment-853807466, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AT7ZPWM2UNMPGOUDUXZNG3DTQ5TPXANCNFSM46AMMAHQ.