RASA - Githubissues

rainit2006 commented 5 years ago

关键词： Intent 表示使用情境，也就是使用者說了什麼的分類 Entity 表示用戶輸入中的對象或是關鍵字

rainit2006 commented 5 years ago

视频： https://vimeo.com/254777331 Rasa训练数据: https://blog.csdn.net/u010505246/article/details/83276354

rainit2006 commented 5 years ago

rasa_core https://rasa.com/docs/core/quickstart/

the basic steps of how a Rasa Core app responds to a message:

rainit2006 commented 5 years ago

Rasa NLU https://rasa.com/docs/nlu/quickstart/

Prepare your NLU Training Data The data is just a list of messages that you expect to receive, annotated with the intent and entities Rasa NLU should learn to extract.
Define your Machine Learning Model
Train your Machine Learning NLU model

Ex, Choosing a Rasa NLU Pipeline：use the spacy_sklearn pipeline:

language: "en"
pipeline: "spacy_sklearn"

config文件： .yml

rainit2006 commented 5 years ago

Prepare your NLU Training Data The data is just a list of messages that you expect to receive, annotated with the intent and entities Rasa NLU should learn to extract. save it as nlu.md
Define your Machine Learning Model save it in a file called nlu_config.yml
Train your Machine Learning NLU model. python -m rasa_nlu.train -c nlu_config.yml --data nlu.md -o models --fixed_model_name nlu --project current --verbose

■Training Data Format

Markdown Format
## intent:greet
- hey
- hello

## synonym:savings   <!-- synonyms, method 2 -->
- pink pig

## regex:zipcode
- [0-9]{5}

## lookup:currencies   <!-- lookup table list -->
- Yen
- USD
- Euro

JSON Format

■Choosing a Rasa NLU Pipeline spaCy model, use the spacy_sklearn pipeline:

language: "en"
pipeline: "spacy_sklearn"

use the tensorflow_embedding pipeline:

language: "en"
pipeline: "tensorflow_embedding"

pipeline configuration saved as config.yml training examples saved as nlu_data.md then you can train the model by running:

$ python -m rasa_nlu.train \
    --config config.yml \
    --data nlu_data/ \
    --path projects

rainit2006 commented 5 years ago

Entity Extraction （实体提取，关键词提取）

Extracting Places, Dates, People, Organisations spaCy has excellent pre-trained named-entity recognisers for a few different langauges.
Dates, Amounts of Money, Durations, Distances, Ordinals The duckling library does a great job of turning expressions
Regular Expressions (regex) You can use regular expressions to help the CRF model learn to recognize entities. In the Training Data Format you can provide a list of regular expressions, each of which provides the ner_crf with an extra binary feature, which says if the regex was found (1) or not (0).

rainit2006 commented 5 years ago

Rasa NLU 深入了解原文：https://blog.csdn.net/love_is_red/article/details/79145962

NLU 的难点主要在语料的准备，接下来就自己了解到的经验进行一一记录。每个意图要有关键字，意图中的每句都要有关键字。每个关键字要扩充20左右的语句。所有语句之间要够发散、离散（即除关键字外尽量不用重复的词语）。除关键字之外，所有的词字，在每个意图中重复率要低、要低，最好不重复。整个文件中，除关键字之外，所有的词字，重复率要低、要低，最好不重复。上面两条造成的现象就是,你我他啊是的吗之类的词都要去掉（语义可以稍微不通顺，可接受）。句式相同，参数不同的意图进行合并，通过后期校验参数进行分辨。

意图识别的准确度跟两方面有关

关键字在当前意图中出现的频率
关键字在整个文件中出现的频率

rainit2006 commented 5 years ago

具体流程：从Rasa Nlu或其他自然语言理解框架，获取训练好的意图识别模型。（这个主要是用来识别用户要问什么，即细分出用户的当前一句话的意图）

编写一个文件 domain.yml 用来放置所有的意图名字、所有用来回答的意图的名字和每个回答意图的回答语句（这个主要是用来声明这个项目支持用户问哪类问题，声明能做出哪种回答，以及部分不涉及逻辑，可以直接回答的问题）
编写另一个文件 stories.md 见名知意，这里用来存放剧本。（以简单的格式将上一个文件的问题与回答建立关系形成各种剧本）
接下来就是用上面的两个文件，训练出 Rasa Core 的模型。
最后，这就到最后了，使用Rasa Nlu 的模型、 Rasa Core 的模型，这两个模型就可以开启服务。
哦，这才是最后，现在就可以根据剧本做一些问答了。

原文：https://blog.csdn.net/love_is_red/article/details/79145979

rainit2006 commented 5 years ago

RASA的安装

RASA NLU的local安装 https://rasa.com/docs/nlu/installation/ https://github.com/RasaHQ/rasa_nlu

RASA的Docker安装 https://rasa.com/docs/nlu/master/docker/ https://github.com/RasaHQ/rasa_nlu

rainit2006 commented 5 years ago

遭遇问题： ■ ModuleNotFoundError: No module named 'rasa_nlu.converters' 原因： rasa的python代码是旧版本的，已经不适用新版本的rasa库了。所以需要用新版本对应的代码。新版本的代码参考 https://rasa.com/docs/nlu/0.13.8/python/

■ OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory. 执行： python -m spacy download en 后提示： Creating a shortcut link for 'en' didn't work (maybe you don't have admin permissions?), but you can still load the model via its full package name: nlp = spacy.load('{name}')

用管理员权限重新启动pycharm后，再执行“python -m spacy download en”。

rainit2006 commented 5 years ago

Python代码

rasa nlu训练后生成的model

from rasa_nlu.training_data import load_data
from rasa_nlu.model import Trainer
from rasa_nlu import config

training_data = load_data('data/examples/rasa/demo-rasa.json')
trainer = Trainer(config.load("sample_configs/config_spacy.yml"))
trainer.train(training_data)
model_directory = trainer.persist('./projects/default/')  # Returns the directory the model is stored in

print("Done.")

https://hackernoon.com/build-simple-chatbot-with-rasa-part-1-f4c6d5bb1aea

 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 from __future__ import unicode_literals

 # from rasa_nlu.converters import load_data
 from rasa_nlu.training_data import load_data

 from rasa_nlu.config import RasaNLUModelConfig
 #from rasa_nlu.config import RasaNLUConfig
 from rasa_nlu.model import Trainer, Metadata, Interpreter
 from rasa_nlu import config

 def train (data, config_file, model_dir):
     training_data = load_data(data)
     configuration = config.load(config_file)
     trainer = Trainer(configuration)
     trainer.train(training_data)
     model_directory = trainer.persist(model_dir, fixed_model_name = 'chat')

 def run():
    interpreter = Interpreter.load('./models/nlu/default/chat')
    print(interpreter.parse('I want to order pizza'))
    #print(interpreter.parse(u'What is the reivew for the movie Die Hard?'))

 if __name__ == '__main__':
     train('./data/training_data.json', './config/config.yml', './models/nlu')
     #run()

rainit2006 commented 5 years ago

RASA GUI npm install -g rasa-nlu-trainer https://github.com/RasaHQ/rasa-nlu-trainer

rainit2006 commented 5 years ago

遭遇问题： 在starter-pack-rasa-starck执行make train-nlu时报错: Illegal instruction(core dump)

网上查找发现是tensorflow引起的问题. 证据: 执行python后,执行>>> import tensorflow as tf 发现提示Illegal instruction (core dumped)

问题发生的原因：这个rasa依赖的tensorflow版本是1.10.0,它要求cpu具有AVX instructions的能力. 而旧CPU不支持AVX instructions,所以就会报错Illegal instruction (core dumped). 只能用低版本的tensorflow, 执行命令

pip uninstall tensorflow
pip install tensorflow==1.5

但是此时又会报错: rasa-core 0.12.3 has requirement tensorflow==1.10.0, but you'll have tensorflow 1.5.0 which is incompatible.

How to verifying AVX Support: We use the following command to list all the CPU features. $ more /proc/cpuinfo | grep flags 详细参考: https://tech.amikelive.com/node-887/how-to-resolve-error-illegal-instruction-core-dumped-when-running-import-tensorflow-in-a-python-program/

tensorflow变成1.5时,rasa版本也要下降: If the latest rasa stack doesn't work for you, consider using rasa_core==0.8., rasa_nlu==0.11. and tensorflow==1.5.*

遭遇问题: 安装tensorflow 1.5时遭遇下面问题: Cannot uninstall 'html5lib'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

对策: 追加--ignore-installed参数: pip install tensorflow==1.5 --ignore-installed

rainit2006 commented 5 years ago

https://blog.csdn.net/u010505246/article/details/82997100

典型Pipeline

初始化组件：加载模型文件，为后续组件提供框架支持，如初始化Spacy和MITIE；
分词组件：将文本分割成词语序列，为后续的高级 NLP 任务提供基础数据；
提取特征组件：提取词语序列文本特征，通常采用Word Embedding的方式，提取特征的组件可以同时使用，同时搭配的还可能有基于正则表达式的提取特征方法。
NER组件：根据前面提供的特征，对文本进行命名实体识别；
意图分类组件：按照语义对文本进行意图分类，也称意图识别。

1、初始化组件目前只有两个初始化组件：nlp_spacy和nlp_mitie，分别对应 SpaCy和MITIE。

基于MITIE的组件，如： tokenizer_mitie、intent_featurizer_mitie、ner_mitie和intent_classifier_mitie都将依赖nlp_mitie 提供的对象。

基于SpaCy的组件，如：tokenizer_spacy、intent_featurizer_spacy和ner_spacy都将依赖nlp_spacy提供的对象。

2、分词组件 Rasa分词组件中，tokenizer_jieba支持中文分词，tokenizer_mitie经过改造可以支持中文分词，tokenizer_spacy暂不支持中文分词但未来会支持（需要跟进）。

3、提取特征组件命名实体识别和意图分类，都需要上游组件提供特征。常见特征有：词向量、Bag-of-words、N-grams、正则表达式。用户可以同时使用多个组件提取特征，这些组件在实现层面上做了合并特性的操作。

4、NER组件 ner_crf：使用CRF模型来做ENR,CRF模型只依赖tokens本身，如果想在feature function中使用POS特性，那么则需要nlp_spacy组件提供spacy_doc对象，来提供POS信息。

ner_mitie：利用MITIE模型提供的language model，只需要tokens就可以进行NER。

ner_spacy：利用SpaCy模型自带的NER功能，模型的训练需要在SpaCy框架下进行，当前SpaCy不支持用户训练自己的模型，而SpaCy官方的模型只支持常见的几种实体，具体情况见官方文档。（某大神说 spacy已经支持自定义实体，spacy中文模型地址：https://github.com/howl-anderson/Chinese_models_for_SpaCy）

ner_duckling：Duckling是Facebook出品的一款用Haskell语言写成的NER库，基于规则和模型。Duckling对中文的支持并不是很全面，只支持诸多实体类型中的几种。在Rssa中有两种方式去调用Duckling，一种是通过duckling这个包使用wrap的方式访问，另一种是通过HTTP访问。上述两种访问方式分别对应ner_duckling和ner_duckling_http这两个组件。

ner_synonyms：正确来说ner_synonyms不是一个命名实体的提取组件，更像是一个归一化的组件。ner_synonyms主要是将各种同义词（synonyms）映射成标准词汇，比如将实体“KFC”映射成“肯德基”，归一化操作为后续业务处理提供便利。

Pipeline组件特征组件之间的顺序很重要，比如NER组件之前要有分词器组件；

组件可替换，比如分词器；

有些组件是互斥的，比如分词器只能有一个；

有些组件可以同时使用，比如提取文本特征的组件可以同时使用基于规则的和基于文本嵌入向量的。

rainit2006 / Artificial-Intelligence

RASA #12

ner_synonyms：正确来说ner_synonyms不是一个命名实体的提取组件，更像是一个归一化的组件。ner_synonyms主要是将各种同义词（synonyms）映射成标准词汇，比如将实体“KFC”映射成“肯德基”，归一化操作为后续业务处理提供便利。