yahoo / lopq

Training of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.
Apache License 2.0
562 stars 130 forks source link

Missing lopq library in sample `spark-submit` call #17

Open michaelmior opened 6 years ago

michaelmior commented 6 years ago

The lopq library is not currently provided to the example spark-submit call in the documentation. I found the easiest solution was to run python setup.py bdist_egg in the python subdirectory and then pass the generated egg to spark-submit via the --py-files parameter. It would probably be helpful if this were added to the documentation.

pumpikano commented 6 years ago

There is a disclaimer of sorts about that at the top of the README. One concern is that there are a variety of ways that the package could be provided to the runtime and none of them impact the usage of the LOPQ scripts, so illustrating a single one in all of the example commands seems distracting.

Perhaps something like this is better though?

spark-submit \
        ... # spark configuraton
        train_model.py \
    --data /hdfs/path/to/data \
    --V 16 \
    --M 8 \
    --model_pkl /hdfs/output/path/model.pkl \
    --model_proto /hdfs/output/path/model.lopq
michaelmior commented 6 years ago

Whoops. I guess I missed that section in the docs. That extra line might be helpful (perhaps also include "see above").