Closed slycoder closed 7 years ago
Also I think it will be not too hard to implement parallel LDA fitting using AD-LDA scheme. I didn't investigate deeply, but seem it should be straightforward. Will happy to contribute.
@slycoder also I discovered your rtm package which supposed to implement sparseLDA ( described by Mimno et al). What is the status of this work?
Ok, cool, I've found the issue and committed to master. At least in the little benchmark I put together, the time went from around 4.2s to 1.7s. Not too shabby!
I'll be pushing this out to CRAN as soon as I get a chance.
As for rtm, I haven't put much effort into it unfortunately and probably won't get the time to do so any time soon =(.
thanks for investigation!
I assumed it would be inlined and optimized but I had too much faith in clang perhaps.
Still slower...
# library(devtools)
# install_github("slycoder/R-lda-deprecated@1f872be29e09e513621aa13f9608ff4f864d598e")
# install_github("dselivanov/text2vec@22c08f4094e5a760aa8f6a975d9e315c597b40de")
library(text2vec)
library(lda)
data("movie_review")
it = itoken(movie_review$review, tolower, word_tokenizer)
v = create_vocabulary(it) %>%
prune_vocabulary(term_count_min = 20)
dtm = create_dtm(it, vocab_vectorizer(v), type = "lda_c")
K = 100
alpha = 1/K
eta = 1/K
n_iter = 10
lda = LDA$new(K, v)
lda$verbose = TRUE
set.seed(1)
system.time({
lda$fit(dtm, n_iter = n_iter, check_convergence_every_n = 0 )
})
#user system elapsed
#4.577 0.013 4.594
set.seed(1)
system.time({
m = lda.collapsed.gibbs.sampler(documents = dtm, K = K, vocab = v$vocab$terms,
num.iterations = n_iter, alpha = alpha, eta = eta,
compute.log.likelihood = FALSE, trace = 2L)
})
#user system elapsed
#8.494 0.037 8.550
Weird. I just tried your code and got:
user system elapsed 3.527 0.012 3.518
user system elapsed 3.760 0.017 3.739
Hm, let me double check...
Really strange - got
# user system elapsed
# 5.358 0.011 5.369
and
# user system elapsed
# 10.103 0.046 10.185
Mb answer in compile flags (so mb compiler can optimize better in text2vec)? I have pretty aggressive optimizations in ~/.R/Makevars
:
CXX1XFLAGS += -march=native -ffast-math -Ofast -mtune=native
CXXFLAGS += -march=native -ffast-math -Ofast -mtune=native
CFLAGS += -march=native -ffast-math -Ofast -mtune=native
Interesting. Yeah, when I add those flags things get much slower. Lemme try to figure out which flag is the culprit.
Ha, maybe most of the flags are the culprit? Here's what I got trying flags on their own.
Nothing: 1.748
-Ofast 2.906
-march-native 3.52
-mtune-native 1.798
-ffast-math 2.920
(This is on a slightly simpler test that I just concocted)
I'm curious, when you get a chance if you could test with different flags to see if I'm going crazy =).
I can confirm: anything except default -O2
(I even tried -O1
) slows down from 4.4 sec to 7.5+ sec. More options = more runtime :-D. Weird!
@slycoder switched from apple clang++/clang to gcc-6/g++-6. This solved all strange problems!
FYI, just made this change which seems a tiny bit faster for the default and a lot faster with -march=native:
https://github.com/slycoder/R-lda-deprecated/commit/75172ee06ed66ae2a1b2614a28aa067197cc1520
With gcc-6
didn't notice any difference with previous commit. On my system best results on example above with gcc-6
and CFLAGS += -march=native -mtune=native -mavx -ffast-math -O3
(actually -ffast-math -O3
is enough).
CFLAGS += -march=native -mtune=native -mavx -ffast-math -O3
options:gcc-6
https://github.com/slycoder/R-lda-deprecated/commit/63df45ac9b1cd5b9f4b3bce9eb07f45bc8e96a65 - 7.2 sec
https://github.com/slycoder/R-lda-deprecated/commit/75172ee06ed66ae2a1b2614a28aa067197cc1520 - 3.5 sec
clang
:
https://github.com/slycoder/R-lda-deprecated/commit/63df45ac9b1cd5b9f4b3bce9eb07f45bc8e96a65 - 11.479 sec
https://github.com/slycoder/R-lda-deprecated/commit/75172ee06ed66ae2a1b2614a28aa067197cc1520 - 4.5 sec
clang-3.8
:
https://github.com/slycoder/R-lda-deprecated/commit/63df45ac9b1cd5b9f4b3bce9eb07f45bc8e96a65 - 11.327 sec
https://github.com/slycoder/R-lda-deprecated/commit/75172ee06ed66ae2a1b2614a28aa067197cc1520 - 4.4 sec
CFLAGS = -mtune=core2 -O2
:gcc-6
https://github.com/slycoder/R-lda-deprecated/commit/63df45ac9b1cd5b9f4b3bce9eb07f45bc8e96a65 - 8.4 sec
https://github.com/slycoder/R-lda-deprecated/commit/75172ee06ed66ae2a1b2614a28aa067197cc1520 - 4.6 sec
clang
:
https://github.com/slycoder/R-lda-deprecated/commit/63df45ac9b1cd5b9f4b3bce9eb07f45bc8e96a65 - 11.1 sec
https://github.com/slycoder/R-lda-deprecated/commit/75172ee06ed66ae2a1b2614a28aa067197cc1520 - 4.6 sec
clang-3.8
:
https://github.com/slycoder/R-lda-deprecated/commit/63df45ac9b1cd5b9f4b3bce9eb07f45bc8e96a65 - 11.1 sec
https://github.com/slycoder/R-lda-deprecated/commit/75172ee06ed66ae2a1b2614a28aa067197cc1520 - 4.4 sec
This is apparently significantly faster:
https://github.com/dselivanov/text2vec/blob/0.4/src/LDA_gibbs.cpp
Should figure out which of the removed things had such an impact.