Using DBSCAN algorithm to cluster similar questions
Script takes optional epsilon argument (otherwise defaults to 0.05). Epsilon is the max distance to form a cluster.
Stores FAQ entries in a 'faq' collection in the database for further processing. Each question and response is stored even though many are similar.
Example FAQ entry document:
Notes
The intention is to run this on cron maybe daily or every five days. Because each run stores the timestamp, we'll be able to look at trends in FAQ over time.
For verified answers, the next step is to script a transfer from the faq collection to a yaml document that can be filled out by a human, per the spec.
Jira: https://jira.mongodb.org/browse/EAI-177
Changes
Example FAQ entry document:
Notes
The intention is to run this on cron maybe daily or every five days. Because each run stores the timestamp, we'll be able to look at trends in FAQ over time.
For verified answers, the next step is to script a transfer from the faq collection to a yaml document that can be filled out by a human, per the spec.