obadakhalili / Hakim

:health_worker: An Arabic Healthcare Conversational Agent | University Project
3 stars 1 forks source link

Hakim - An Arabic Healthcare Conversational Agent

Demo Pictures

hakim-demo-1 hakim-demo-2

Approach Research

In our research, we found many approaches to building a conversational agent. All are categorizable into two main types:

End-to-end model architectures

End-to-end models are one-component systems where the input (the user utterance in our case) is directly fed into the model, and its output (the agent response) is propagated to the user directly.

Our problem requires a sequence-to-sequence model, and a popular choice for seq2seq models is BERT.

A diagram for e2e models

Problems with this approach:

Modules-based systems

Modules-based systems are composed of multiple components each responsible for a certain task and organized together as a whole.

A diagram for modules-based models

There are many ways to build a modules-based conversational agent, and our solution of choice was a Task-oriented Dialogue System.

Task-oriented Dialogue System

Agents that operate in a dialogue-driven environment (for example, chat applications) and their goal is to accomplish a user task, such as providing a diagnosis given user symptoms.

Task-oriented dialog system architecture as proposed by its original paper

NOTE: The components of this system are only explained intuitively below without technical details. That’s because the technology which we used to implement this system, Rasa, doesn’t adapt fully to the system described in the diagram, and so some components might be irrelevant to us (like the user simulator). But we introduced the system here because it will lay out an intuitive foundation for understanding the different components of our bot, and their technical details are explained in the implementation section.

Dialogue System

Natural Language Understanding (NLU)

This component is responsible for transforming the user input into structured information that the computer can use and make assumptions on, called the Semantic Frame. It has two main jobs:

Example:

Dialogue Manager (DM)

After propagating the semantic frame produced by the NLU component to the DM, it uses this information for the next action prediction. It’s composed of two sub-components:

User Simulator

This component is necessary only during training time to learn the dialogue policy mentioned above.

It encapsulates a certain user goal, for example, knowing the diagnosis of a certain disease given user symptoms, and it interacts with the dialogue system component to teach the policy optimal action prediction.

Implementation

Our technology of choice for implementing the described Task-oriented system is Rasa.

Rasa is an open-source machine learning framework for building conversational agents. It provides rich APIs which can be used to build various task-oriented dialogue systems.

Rasa bot architecture

Rasa bot architecture

Agent

The interface of the bot. It has APIs to train a model, load it, and receive and send messages through its RESTful API endpoints. It wraps around the NLU and DM components and uses them for the actual user message processing.

NLU Pipeline

A series of steps that are responsible for intent classification and entity extraction training and prediction. Those steps are defined in the config.yml file.

The intents and entities are defined in a domain.yml file, the file which represents everything the agent knows (intents, entities, slots, actions, and responses). The related code from the file:

intents:
  - affirm # e.g: Yes, indeed.
  - age_report # e.g: I'm 21 years old.
  - deny # e.g: No, not really.
  - dont_know # e.g: I'm not sure.
  - goodbye # e.g: Bye. see you.
  - greet # e.g: Hi.
  - nlu_fallback # Any message that doesn't fall into one of the other intents.
  - observations_report # e.g: I have a very bad stomachache.
  - restart # e.g: I would like to restart this conversation.
  - sex_report # e.g: I'm a male.
  - symptoms_inquiry # e.g: What are the symptoms of COVID-19.
entities:
  - sex # Synonyms like (رجل, ذكر, انثى, ابي) are mapped to either male or female.
  - age

The intents and entities’ training examples are defined in a nlu.yml file. A sample from the file:

nlu:
- intent: deny
  examples: |
    - لا
- intent: affirm
  examples: |
    - صحيح
- intent: dont_know
  examples: |
    - لا اعرف
- intent: age_report
  examples: |
    - عمري [15](age) عام
- intent: sex_report
  examples: |
    - انا [انثى]{"entity": "sex", "value": "female"}
- intent: observations_report
  examples: |
    - أشعر بألم في اعصاب يدي
- intent: symptoms_inquiry
  examples: |
    - ما هي اعراض مرض السكري؟
- intent: greet
  examples: |
    - مرحبا
- intent: goodbye
  examples: |
    - وداعا
- intent: restart
  examples: |
    - اعادة البدء
- intent: nlu_fallback
  examples: |
    - كم عمرك؟

Entity tagging is done by Rasa using the BILOU tagging scheme:

BILOU-taggign

Where:

The training examples were either scrapped from medical forums like WebTeb, or written manually by us.

Dialogue Policies

Rasa has multiple rule-based and machine-learning policies that can be used to decide what action to take next given a user utterance. The desired policies are configured inside the config.yml file.

At every conversation turn (initiated by a user utterance), each of the defined policies predicts the next action to take by the agent along with a confidence level, and the agent predicts the action with the highest confidence. If two or more policies predicted actions with the same confidence level, the agent predicts the action of the policy with the highest predefined priority. And if two or more policies of the same priority predicted actions with the same confidence level, the agent predicts one of the actions at random.

Actions predictions can either be:

responses:
  utter_confirm_restart:
    - text: هل انت متأكد انك تريد الاعادة من البداية؟
  utter_greet_back:
    - text: مرحبا, انا حكيم, طبيبك الالي 👨‍⚕️. يمكنك ان تبدأ بان تخبرني بأية اعراض تشعر بها, وسوف احاول مساعدتك من هناك
  utter_goodbye:
    - text: وداعا واتمنى ان اكون قد افدتك بمعرفتي
  utter_observations_too_long:
    - text: الرجاء التأكد من ان طول الرسالة لا يتجاوز 2048 حرف
  utter_specify_sex:
    - text: ما هو جنسك؟
  utter_specify_age:
    - text: ما هو عمرك؟
  utter_no_observations:
    - text: لم يتم تحديد اي اعراض في الرسالة, الرجاء التأكد من محتوى الرسالة
  utter_pediatrics_not_supported:
    - text: للأسف انا لا ادعم طب الاطفال تحت سن 13 عاما
  utter_age_too_high:
    - text: عمرك اكبر من العمر المحدد وهو 130 الرجاء التأكد من العمر المدخل
  utter_symptoms_inquiry_out_of_scope:
    - text: حاليا انا لا ادعم الاسئله عن اعراض الامراض
  utter_fallback_message:
    - text: اعتذر لكنني لا افهم ما تحاول السؤال عنه
  utter_default:
    - text: هذا السؤال خارج نطاق معرفتي, هل انت متأكد من صياغة السؤال

Responses use cases:

Training data format for the dialogue policies:

Actions Server

When the agent predicts the next action to be a custom action, this custom action is invoked by calling a RESTful API endpoint that follows a Rasa predefined standard for communicating the input and the output of that custom action back to the agent.

Illustration showing the nature of the relationship between the bot server and the custom actions server

Illustration showing the nature of the relationship between the bot server and the custom actions server

The endpoint of the actions server which the agent has to communicate with is defined inside the endpoints.yml file.

action_endpoint:
  url: "http://localhost:5055/webhook" # When the actions server is hosted locally

Channel Connector

A channel connector is the means through which the agent receives user messages.

We can integrate it with our own website, Facebook Messenger, Slack, Telegram, and many other channel connectors. We choose Facebook Messenger as our channel connector of choice.

Tracker Store

This is the place where the bot’s conversations are stored. Rasa provides out-of-box integrations with different store types like SQL, Redis, and MongoDB. But for this phase, we used the default in-memory store, which stores the conversations in the server’s memory.

The NLU Pipeline and Dialogue Policies Configuration and Technical Details

NLU Pipeline

The configuration of our NLU Pipeline inside the config.yml file:

pipeline:
  - name: WhitespaceTokenizer
  - name: LanguageModelFeaturizer
    model_name: bert
    model_weights: asafaya/bert-base-arabic
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1

nlu-pipeline

Dialogue Policies

The configuration of our dialogue policies inside the config.yml file:

policies:
  - name: MemoizationPolicy
  - name: RulePolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100

Illustration explaining the nature of input and output for TED model

Conclusion

We learned a lot about building a chatbot from this project. We can confidently say that we have gone from zero to hero in the task of building a conversational agent. We learned about many NLP concepts and all the phases important to building an AI project, starting from the idea until deployment. It’s true that we didn’t get the most optimal results for this task, but it was definitely a successful experience, and if we ever decided to pursue Hakim in the future, now we know where to start from, and what are the most important problems that need addressing.