Currently one important component of the JoBimText pipeline is conducted in a non-distributed fashion, namely the word sense induction. This means transfer of files from the HDFS and back. Also this limits scalability of the method. Your goal is to implement the component in a distributed way.
Motivation
Currently one important component of the JoBimText pipeline is conducted in a non-distributed fashion, namely the word sense induction. This means transfer of files from the HDFS and back. Also this limits scalability of the method. Your goal is to implement the component in a distributed way.
Implementation