Replace text analysis with text recognition

vvasco commented 4 years ago

Currently speech interaction consists in (1) converting speech to text and (2) analyzing the transcripts (see #192). Point 2 is carried out by using google cloud services to retrieve the sentence's structure, in terms of root, verbs and nouns. Such structure is further analyzed to interpret the question by looking for dependencies between root, verbs and nouns. If a dependency is found, we look for specific keywords (such as "veloce", "bastone" etc.). This system currently works for few keywords (speed, aid, repetition, feedback), but with a higher number of keywords it might be difficult to cover all the possible dependencies, and thus difficult to extend.

An alternative might be to replace the text analysis (2) with a text recognizer, which is directly fed with italian sentences and classifies them into the category of our interest. This would require to create a dataset with examples including our desired categories and use it to train / fine-tune an already existing model. Such system would then be dependent on the specific use case. Google services give the possibility to create a custom machine learning model to classify the text content into domain-specific categories, with a price depending on the training hours, the size of the dataset and the number of the models deployed. Additional resources that might be useful:

cc @pattacini @vtikha

vvasco commented 4 years ago

In the analysis provided by google, each word is given a tag, which can be VERB, NOUN, DET, ADP, ADV, ADJ, PRON and so on. Google also provides the lemma of the word.

In the analysis we do on the top of this, we check if there is any word with tag NOUN, ADV, ADJ, ADP and, if so, if its lemma corresponds to one of the following words:

for speed: "veloce", "andatura", "velocita", "velocemente", "piano"
for aid: "bastone", "muro", "sedia"
for repetition: "ripetizione", "volta"
for feedback: "cosa", "come", "bene"

We defined such words based on examples of questions that came into our mind. I report here examples that work and don't work with the current system for each keyword we have.

`speed`:

Working	Not working
Sto andando (/mi sto muovendo) troppo piano (/veloce /velocemente)	Sto andando (/mi sto muovendo) troppo lentamente
A che velocità devo andare (/muovermi)	A che velocità devo fare l'esercizio (/il test)
Quanto devo andare veloce
Che andatura devo avere (/mantenere)

`aid`:

Working	Not working
Posso (/E' consentito) usare un (/il mio) bastone (/sedia)	Posso usare il deambulatore

`repetition`:

Working	Not working
Quante volte devo ripetere (/fare)	Quante volte devo ripetere l'esercizio (/il test)
Quante ripetizioni devo fare

`feedback`:

Working	Not working
Sto facendo (/andando) bene	Sto facendo (/andando) male
Come sto andando (/facendo)	Come sto facendo l'esercizio (/il test)

For the examples I reported, we can see that the system fails in two cases:

when the extracted lemma does not correspond to any of the listed words. I highlighted the words in italic in the table: for example the sentence including "deambulatore" for aid, or that including "lentamente" for speed, or that including "male" for feedback.
when there are two words with the same tag. I highlighted the words in bold: for example, "A che velocità devo fare l'esercizio" does not work because "velocità" and "esercizio" are both tagged as NOUN.

For now, we could make the current system more robust to these two failure cases, meanwhile we look for other solutions to directly classify sentences.
@vtikha @pattacini what do you think?

pattacini commented 4 years ago

I think it's a good plan 👍

vvasco commented 4 years ago

This might also be an interesting project for a thesis.

robotology / assistive-rehab