rainit2006 / Artificial-Intelligence

1 stars 0 forks source link

RASA #12

Open rainit2006 opened 5 years ago

rainit2006 commented 5 years ago

关键词: Intent 表示使用情境,也就是使用者說了什麼的分類 Entity 表示用戶輸入中的對象或是關鍵字

rainit2006 commented 5 years ago

视频: https://vimeo.com/254777331 Rasa训练数据: https://blog.csdn.net/u010505246/article/details/83276354

rainit2006 commented 5 years ago

rasa_core https://rasa.com/docs/core/quickstart/

the basic steps of how a Rasa Core app responds to a message: image

rainit2006 commented 5 years ago

Rasa NLU https://rasa.com/docs/nlu/quickstart/

  1. Prepare your NLU Training Data The data is just a list of messages that you expect to receive, annotated with the intent and entities Rasa NLU should learn to extract.
  2. Define your Machine Learning Model
  3. Train your Machine Learning NLU model

Ex, Choosing a Rasa NLU Pipeline:use the spacy_sklearn pipeline:

language: "en"
pipeline: "spacy_sklearn"

config文件: .yml

rainit2006 commented 5 years ago
  1. Prepare your NLU Training Data The data is just a list of messages that you expect to receive, annotated with the intent and entities Rasa NLU should learn to extract. save it as nlu.md

  2. Define your Machine Learning Model save it in a file called nlu_config.yml

  3. Train your Machine Learning NLU model. python -m rasa_nlu.train -c nlu_config.yml --data nlu.md -o models --fixed_model_name nlu --project current --verbose

■Training Data Format

Markdown Format
## intent:greet
- hey
- hello

## synonym:savings   <!-- synonyms, method 2 -->
- pink pig

## regex:zipcode
- [0-9]{5}

## lookup:currencies   <!-- lookup table list -->
- Yen
- USD
- Euro

JSON Format

■Choosing a Rasa NLU Pipeline spaCy model, use the spacy_sklearn pipeline:

language: "en"
pipeline: "spacy_sklearn"

use the tensorflow_embedding pipeline:

language: "en"
pipeline: "tensorflow_embedding"

pipeline configuration saved as config.yml training examples saved as nlu_data.md then you can train the model by running:

$ python -m rasa_nlu.train \
    --config config.yml \
    --data nlu_data/ \
    --path projects
rainit2006 commented 5 years ago

Entity Extraction (实体提取, 关键词提取)

rainit2006 commented 5 years ago

Rasa NLU 深入了解 原文:https://blog.csdn.net/love_is_red/article/details/79145962

NLU 的难点主要在语料的准备, 接下来就自己了解到的经验进行一一记录。 每个意图要有关键字,意图中的每句都要有关键字。 每个关键字要扩充20左右的语句。 所有语句之间要够发散、离散(即除关键字外尽量不用重复的词语)。 除关键字之外,所有的词字,在每个意图中重复率要低、要低,最好不重复。 整个文件中,除关键字之外,所有的词字,重复率要低、要低,最好不重复。 上面两条造成的现象就是,你我他啊是的吗之类的词都要去掉(语义可以稍微不通顺,可接受)。 句式相同,参数不同的意图进行合并,通过后期校验参数进行分辨。

意图识别的准确度跟两方面有关

  1. 关键字在当前意图中出现的频率
  2. 关键字在整个文件中出现的频率
rainit2006 commented 5 years ago

具体流程: 从Rasa Nlu或其他自然语言理解框架,获取训练好的意图识别模型。(这个主要是用来识别用户要问什么,即细分出用户的当前一句话的意图)

rainit2006 commented 5 years ago

RASA的安装

RASA NLU的local安装 https://rasa.com/docs/nlu/installation/ https://github.com/RasaHQ/rasa_nlu

RASA的Docker安装 https://rasa.com/docs/nlu/master/docker/ https://github.com/RasaHQ/rasa_nlu

rainit2006 commented 5 years ago

遭遇问题: ■ ModuleNotFoundError: No module named 'rasa_nlu.converters' 原因: rasa的python代码是旧版本的,已经不适用新版本的rasa库了。所以需要用新版本对应的代码。 新版本的代码参考 https://rasa.com/docs/nlu/0.13.8/python/

■ OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory. 执行: python -m spacy download en 后 提示: Creating a shortcut link for 'en' didn't work (maybe you don't have admin permissions?), but you can still load the model via its full package name: nlp = spacy.load('{name}')

用管理员权限重新启动pycharm后,再执行“python -m spacy download en”。

rainit2006 commented 5 years ago

Python代码

rasa nlu训练后生成的model image

from rasa_nlu.training_data import load_data
from rasa_nlu.model import Trainer
from rasa_nlu import config

training_data = load_data('data/examples/rasa/demo-rasa.json')
trainer = Trainer(config.load("sample_configs/config_spacy.yml"))
trainer.train(training_data)
model_directory = trainer.persist('./projects/default/')  # Returns the directory the model is stored in

print("Done.")

https://hackernoon.com/build-simple-chatbot-with-rasa-part-1-f4c6d5bb1aea

 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 from __future__ import unicode_literals

 # from rasa_nlu.converters import load_data
 from rasa_nlu.training_data import load_data

 from rasa_nlu.config import RasaNLUModelConfig
 #from rasa_nlu.config import RasaNLUConfig
 from rasa_nlu.model import Trainer, Metadata, Interpreter
 from rasa_nlu import config

 def train (data, config_file, model_dir):
     training_data = load_data(data)
     configuration = config.load(config_file)
     trainer = Trainer(configuration)
     trainer.train(training_data)
     model_directory = trainer.persist(model_dir, fixed_model_name = 'chat')

 def run():
    interpreter = Interpreter.load('./models/nlu/default/chat')
    print(interpreter.parse('I want to order pizza'))
    #print(interpreter.parse(u'What is the reivew for the movie Die Hard?'))

 if __name__ == '__main__':
     train('./data/training_data.json', './config/config.yml', './models/nlu')
     #run()
rainit2006 commented 5 years ago

RASA GUI npm install -g rasa-nlu-trainer https://github.com/RasaHQ/rasa-nlu-trainer

rainit2006 commented 5 years ago

遭遇问题: 在starter-pack-rasa-starck执行make train-nlu时报错: Illegal instruction(core dump)

网上查找发现是tensorflow引起的问题. 证据: 执行python后,执行>>> import tensorflow as tf 发现提示Illegal instruction (core dumped)

问题发生的原因: 这个rasa依赖的tensorflow版本是1.10.0,它要求cpu具有AVX instructions的能力. 而旧CPU不支持AVX instructions,所以就会报错Illegal instruction (core dumped). 只能用低版本的tensorflow, 执行命令

pip uninstall tensorflow
pip install tensorflow==1.5

但是此时又会报错: rasa-core 0.12.3 has requirement tensorflow==1.10.0, but you'll have tensorflow 1.5.0 which is incompatible.

How to verifying AVX Support: We use the following command to list all the CPU features. $ more /proc/cpuinfo | grep flags 详细参考: https://tech.amikelive.com/node-887/how-to-resolve-error-illegal-instruction-core-dumped-when-running-import-tensorflow-in-a-python-program/

tensorflow变成1.5时,rasa版本也要下降: If the latest rasa stack doesn't work for you, consider using rasa_core==0.8., rasa_nlu==0.11. and tensorflow==1.5.*

遭遇问题: 安装tensorflow 1.5时遭遇下面问题: Cannot uninstall 'html5lib'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

对策: 追加--ignore-installed参数: pip install tensorflow==1.5 --ignore-installed

rainit2006 commented 5 years ago

https://blog.csdn.net/u010505246/article/details/82997100

典型Pipeline image


1、初始化组件 目前只有两个初始化组件:nlp_spacy和nlp_mitie,分别对应 SpaCy和MITIE。

基于MITIE的组件,如: tokenizer_mitie、intent_featurizer_mitie、ner_mitie和intent_classifier_mitie都将依赖nlp_mitie 提供的对象。

基于SpaCy的组件,如:tokenizer_spacy、intent_featurizer_spacy和ner_spacy都将依赖nlp_spacy提供的对象。

2、分词组件 Rasa分词组件中,tokenizer_jieba支持中文分词,tokenizer_mitie经过改造可以支持中文分词,tokenizer_spacy暂不支持中文分词但未来会支持(需要跟进)。

3、提取特征组件 命名实体识别和意图分类,都需要上游组件提供特征。常见特征有:词向量、Bag-of-words、N-grams、正则表达式。用户可以同时使用多个组件提取特征,这些组件在实现层面上做了合并特性的操作。

4、NER组件 ner_crf:使用CRF模型来做ENR,CRF模型只依赖tokens本身,如果想在feature function中使用POS特性,那么则需要nlp_spacy组件提供spacy_doc对象,来提供POS信息。

ner_mitie:利用MITIE模型提供的language model,只需要tokens就可以进行NER。

ner_spacy:利用SpaCy模型自带的NER功能,模型的训练需要在SpaCy框架下进行,当前SpaCy不支持用户训练自己的模型,而SpaCy官方的模型只支持常见的几种实体,具体情况见官方文档。(某大神说 spacy已经支持自定义实体,spacy中文模型地址:https://github.com/howl-anderson/Chinese_models_for_SpaCy

ner_duckling:Duckling是Facebook出品的一款用Haskell语言写成的NER库,基于规则和模型。Duckling对中文的支持并不是很全面,只支持诸多实体类型中的几种。在Rssa中有两种方式去调用Duckling,一种是通过duckling这个包使用wrap的方式访问,另一种是通过HTTP访问。上述两种访问方式分别对应ner_duckling和ner_duckling_http这两个组件。

ner_synonyms:正确来说ner_synonyms不是一个命名实体的提取组件,更像是一个归一化的组件。ner_synonyms主要是将各种同义词(synonyms)映射成标准词汇,比如将实体“KFC”映射成“肯德基”,归一化操作为后续业务处理提供便利。

Pipeline组件特征 组件之间的顺序很重要,比如NER组件之前要有分词器组件;

组件可替换,比如分词器;

有些组件是互斥的,比如分词器只能有一个;

有些组件可以同时使用,比如提取文本特征的组件可以同时使用基于规则的和基于文本嵌入向量的。