Adapter to run Python top2vec topic model in scicloj.ml
You needed to setup a Clojure repl with:
I provied here a Dockerfile which does the above instalation correctly. Using this, a working repl running in Docker can be started with:
docker run -ti -v $HOME/.m2:/home/user/.m2 -v "$(pwd):/app" -p 12345:12345 -w /app scicloj.ml.top2vec python3 -c "import cljbridge;cljbridge.init_clojure_repl(port=12345,bind='0.0.0.0')"
Then the followin code trains the top2vec model on some texts.
(require '[clojure.test :refer :all]
'[scicloj.ml.top2vec :refer :all]
'[camel-snake-kebab.core :as csk]
'[tablecloth.api :as tc])
(def raw-data
(tc/dataset "https://github.com/scicloj/scicloj.ml.smile/blob/main/test/data/reviews.csv.gz?raw=true"
{:key-fn csk/->kebab-case-keyword
:file-type :csv
:gzipped? true}))
(def data
(-> raw-data
(tc/shuffle {:seed 123})
(tc/head 10000)
(tc/select-columns :text)
tc/drop-missing))
(def train-result-learn
(scicloj.metamorph.ml/train data {:speed :learn
:model-type :top2vec
:min_count 1
:documents-column :text}))
(clojure.pprint/pprint (update-in train-result-learn [:model-data] dissoc :model-as-bytes))
(def top2vec-model-py (scicloj.metamorph.ml/thaw-model train-result-learn))
The obtained top2vec-model-py
is the python object of the trained model.
It can be used from Clojure via libpython-clj
calls of its API:
https://top2vec.readthedocs.io/en/latest/api.html
For a few cases I provide wrappers for the python API. A wordcloud of a topic (the first this case) can be obtained as a SVG string by:
(wc->svg top2vec-model-py (first (get-all-word-scores top2vec-model-py)) 100 100)
Copyright © 2021 Carsten Behring
EPLv1.0 is just the default for projects generated by clj-new
: you are not
required to open source this project, nor are you required to use EPLv1.0!
Feel free to remove or change the LICENSE
file and remove or update this
section of the README.md
file!
Distributed under the Eclipse Public License version 1.0.