Open GoogleCodeExporter opened 8 years ago
I have some ideas on this, specifically N-Gram analysis of the data with cross
referencing. Think keyword spotting on steroids for the freeform text
questions, and
'people who chose X for Q1, were most likely to choose Y for Q2'. Only relations
which are statistically significant would be shown to keep the noise level low.
[http://nltk.org/index.php/Main_Page NLTK] provides a good base so we do not
need to
reinvent the hard stuff. Use the
[http://code.google.com/p/django-command-extensions/
Jobs extension] for batch training and hard stuff.
Original comment by doug.nap...@gmail.com
on 8 May 2008 at 8:13
Original issue reported on code.google.com by
yann.ma...@gmail.com
on 2 May 2008 at 3:02