Open rainit2006 opened 5 years ago
rasa_core https://rasa.com/docs/core/quickstart/
the basic steps of how a Rasa Core app responds to a message:
Rasa NLU https://rasa.com/docs/nlu/quickstart/
Ex, Choosing a Rasa NLU Pipeline:use the spacy_sklearn pipeline:
language: "en"
pipeline: "spacy_sklearn"
config文件: .yml
Prepare your NLU Training Data The data is just a list of messages that you expect to receive, annotated with the intent and entities Rasa NLU should learn to extract. save it as nlu.md
Define your Machine Learning Model save it in a file called nlu_config.yml
Train your Machine Learning NLU model. python -m rasa_nlu.train -c nlu_config.yml --data nlu.md -o models --fixed_model_name nlu --project current --verbose
■Training Data Format
Markdown Format
## intent:greet
- hey
- hello
## synonym:savings <!-- synonyms, method 2 -->
- pink pig
## regex:zipcode
- [0-9]{5}
## lookup:currencies <!-- lookup table list -->
- Yen
- USD
- Euro
JSON Format
■Choosing a Rasa NLU Pipeline spaCy model, use the spacy_sklearn pipeline:
language: "en"
pipeline: "spacy_sklearn"
use the tensorflow_embedding pipeline:
language: "en"
pipeline: "tensorflow_embedding"
pipeline configuration saved as config.yml training examples saved as nlu_data.md then you can train the model by running:
$ python -m rasa_nlu.train \
--config config.yml \
--data nlu_data/ \
--path projects
Entity Extraction (实体提取, 关键词提取)
Extracting Places, Dates, People, Organisations spaCy has excellent pre-trained named-entity recognisers for a few different langauges.
Dates, Amounts of Money, Durations, Distances, Ordinals The duckling library does a great job of turning expressions
Regular Expressions (regex) You can use regular expressions to help the CRF model learn to recognize entities. In the Training Data Format you can provide a list of regular expressions, each of which provides the ner_crf with an extra binary feature, which says if the regex was found (1) or not (0).
Rasa NLU 深入了解 原文:https://blog.csdn.net/love_is_red/article/details/79145962
NLU 的难点主要在语料的准备, 接下来就自己了解到的经验进行一一记录。 每个意图要有关键字,意图中的每句都要有关键字。 每个关键字要扩充20左右的语句。 所有语句之间要够发散、离散(即除关键字外尽量不用重复的词语)。 除关键字之外,所有的词字,在每个意图中重复率要低、要低,最好不重复。 整个文件中,除关键字之外,所有的词字,重复率要低、要低,最好不重复。 上面两条造成的现象就是,你我他啊是的吗之类的词都要去掉(语义可以稍微不通顺,可接受)。 句式相同,参数不同的意图进行合并,通过后期校验参数进行分辨。
意图识别的准确度跟两方面有关
具体流程: 从Rasa Nlu或其他自然语言理解框架,获取训练好的意图识别模型。(这个主要是用来识别用户要问什么,即细分出用户的当前一句话的意图)
原文:https://blog.csdn.net/love_is_red/article/details/79145979
RASA的安装
RASA NLU的local安装 https://rasa.com/docs/nlu/installation/ https://github.com/RasaHQ/rasa_nlu
RASA的Docker安装 https://rasa.com/docs/nlu/master/docker/ https://github.com/RasaHQ/rasa_nlu
遭遇问题: ■ ModuleNotFoundError: No module named 'rasa_nlu.converters' 原因: rasa的python代码是旧版本的,已经不适用新版本的rasa库了。所以需要用新版本对应的代码。 新版本的代码参考 https://rasa.com/docs/nlu/0.13.8/python/
■ OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory. 执行: python -m spacy download en 后 提示: Creating a shortcut link for 'en' didn't work (maybe you don't have admin permissions?), but you can still load the model via its full package name: nlp = spacy.load('{name}')
用管理员权限重新启动pycharm后,再执行“python -m spacy download en”。
Python代码
rasa nlu训练后生成的model
from rasa_nlu.training_data import load_data
from rasa_nlu.model import Trainer
from rasa_nlu import config
training_data = load_data('data/examples/rasa/demo-rasa.json')
trainer = Trainer(config.load("sample_configs/config_spacy.yml"))
trainer.train(training_data)
model_directory = trainer.persist('./projects/default/') # Returns the directory the model is stored in
print("Done.")
https://hackernoon.com/build-simple-chatbot-with-rasa-part-1-f4c6d5bb1aea
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
# from rasa_nlu.converters import load_data
from rasa_nlu.training_data import load_data
from rasa_nlu.config import RasaNLUModelConfig
#from rasa_nlu.config import RasaNLUConfig
from rasa_nlu.model import Trainer, Metadata, Interpreter
from rasa_nlu import config
def train (data, config_file, model_dir):
training_data = load_data(data)
configuration = config.load(config_file)
trainer = Trainer(configuration)
trainer.train(training_data)
model_directory = trainer.persist(model_dir, fixed_model_name = 'chat')
def run():
interpreter = Interpreter.load('./models/nlu/default/chat')
print(interpreter.parse('I want to order pizza'))
#print(interpreter.parse(u'What is the reivew for the movie Die Hard?'))
if __name__ == '__main__':
train('./data/training_data.json', './config/config.yml', './models/nlu')
#run()
RASA GUI
npm install -g rasa-nlu-trainer
https://github.com/RasaHQ/rasa-nlu-trainer
遭遇问题: 在starter-pack-rasa-starck执行make train-nlu时报错: Illegal instruction(core dump)
网上查找发现是tensorflow引起的问题. 证据: 执行python后,执行>>> import tensorflow as tf 发现提示Illegal instruction (core dumped)
问题发生的原因: 这个rasa依赖的tensorflow版本是1.10.0,它要求cpu具有AVX instructions的能力. 而旧CPU不支持AVX instructions,所以就会报错Illegal instruction (core dumped). 只能用低版本的tensorflow, 执行命令
pip uninstall tensorflow
pip install tensorflow==1.5
但是此时又会报错: rasa-core 0.12.3 has requirement tensorflow==1.10.0, but you'll have tensorflow 1.5.0 which is incompatible.
How to verifying AVX Support:
We use the following command to list all the CPU features.
$ more /proc/cpuinfo | grep flags
详细参考: https://tech.amikelive.com/node-887/how-to-resolve-error-illegal-instruction-core-dumped-when-running-import-tensorflow-in-a-python-program/
tensorflow变成1.5时,rasa版本也要下降: If the latest rasa stack doesn't work for you, consider using rasa_core==0.8., rasa_nlu==0.11. and tensorflow==1.5.*
遭遇问题: 安装tensorflow 1.5时遭遇下面问题: Cannot uninstall 'html5lib'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
对策: 追加--ignore-installed参数: pip install tensorflow==1.5 --ignore-installed
https://blog.csdn.net/u010505246/article/details/82997100
典型Pipeline
1、初始化组件 目前只有两个初始化组件:nlp_spacy和nlp_mitie,分别对应 SpaCy和MITIE。
基于MITIE的组件,如: tokenizer_mitie、intent_featurizer_mitie、ner_mitie和intent_classifier_mitie都将依赖nlp_mitie 提供的对象。
基于SpaCy的组件,如:tokenizer_spacy、intent_featurizer_spacy和ner_spacy都将依赖nlp_spacy提供的对象。
2、分词组件 Rasa分词组件中,tokenizer_jieba支持中文分词,tokenizer_mitie经过改造可以支持中文分词,tokenizer_spacy暂不支持中文分词但未来会支持(需要跟进)。
3、提取特征组件 命名实体识别和意图分类,都需要上游组件提供特征。常见特征有:词向量、Bag-of-words、N-grams、正则表达式。用户可以同时使用多个组件提取特征,这些组件在实现层面上做了合并特性的操作。
4、NER组件 ner_crf:使用CRF模型来做ENR,CRF模型只依赖tokens本身,如果想在feature function中使用POS特性,那么则需要nlp_spacy组件提供spacy_doc对象,来提供POS信息。
ner_mitie:利用MITIE模型提供的language model,只需要tokens就可以进行NER。
ner_spacy:利用SpaCy模型自带的NER功能,模型的训练需要在SpaCy框架下进行,当前SpaCy不支持用户训练自己的模型,而SpaCy官方的模型只支持常见的几种实体,具体情况见官方文档。(某大神说 spacy已经支持自定义实体,spacy中文模型地址:https://github.com/howl-anderson/Chinese_models_for_SpaCy)
ner_duckling:Duckling是Facebook出品的一款用Haskell语言写成的NER库,基于规则和模型。Duckling对中文的支持并不是很全面,只支持诸多实体类型中的几种。在Rssa中有两种方式去调用Duckling,一种是通过duckling这个包使用wrap的方式访问,另一种是通过HTTP访问。上述两种访问方式分别对应ner_duckling和ner_duckling_http这两个组件。
Pipeline组件特征 组件之间的顺序很重要,比如NER组件之前要有分词器组件;
组件可替换,比如分词器;
有些组件是互斥的,比如分词器只能有一个;
有些组件可以同时使用,比如提取文本特征的组件可以同时使用基于规则的和基于文本嵌入向量的。
关键词: Intent 表示使用情境,也就是使用者說了什麼的分類 Entity 表示用戶輸入中的對象或是關鍵字