Convert document vector directly from spark frame

In the part of computing similarity of all documents

    cosines = docdf.rdd.map(lambda d: (core.cosine(vec.tolist(), getVec(d[0]).tolist()), d[0]))
    topK = cosines.sortByKey(ascending=False, numPartitions=1).collect()

everytime we have to use getVec() to retrieve the vec of the document from database, this adds a lot of time to the execution.

using getVec() is a work around for error parsing blob data to ndarry from spark frame directly.

bc converting directly from spark the data contains nan value in vector.

Maybe different encoding for db and spark

neoVincent / Document_Similarity

Convert document vector directly from spark frame #1