Open lucianolorenti opened 4 years ago
I tried your version but something is wrong.
I think there are two issues: The first one is that I forgot to add the__init__.py
file.
And the second is that
This file requires compiler and library support for the ISO C++ 2011 standard.
I was using gcc 9.2.0 which I suppose it uses c++11 as default. Now I added the init.py file and the explicit argument -std=c++11
.
Tell me if not works for you
I think there are two issues: The first one is that I forgot to add the
__init__.py
file. And the second is that
This file requires compiler and library support for the ISO C++ 2011 standard.
I was using gcc 9.2.0 which I suppose it uses c++11 as default. Now I added the init.py file and the explicit argument-std=c++11
. Tell me if not works for you
It works, successfully installed btm-0.1.0, thanks for your solution.
In case is anyone interested. I've made a python extension out of this code. It is more or less the same code, except it is wrapped with python-boost. And it avoids all the intermediate files. You can use it something like this:
import btm number_of_topics = 2 alpha = 50/2 beta = 0.0005 n_iters = 50000 btm_model = btm.Model(number_of_topics, alpha, beta, n_iters, 3, True) btm_model.fit(["sentence 1", "sentence 2", "sentence 2"]) pz = btm_model.get_pz() pw_z = btm_model.get_pw_z( ) vocabulary = btm_model.vocabulary() b = btm_model.predict(["ANother sentence"], "sum_b")
when i run the example code above, i got something like this:
ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 1 of 50001 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 2 of 50001
Is it the expected result or not?
No is not. Somehow is accessing the pvec in the position 3 when it has only 3 elements. I am going to try in another PC to see if I get the same error.
I've tried with another ArchLinux and it worked. I'm going to try with an ubuntu.
I tried in a Debian 10. And the version of boost-python was old. I had to recompile boost-python in order to work. But apart from that, I did not have any other problem. I don't know what is happening in your case.
Hi!
I tried but the code is not working. It says:
C:\train\B-Python>pip install . Processing c:\train\b-python Building wheels for collected packages: btm Running setup.py bdist_wheel for btm ... error Complete output from command C:\Users\07390\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d C:\Users\07390\AppData\Local\Temp\pip-wheel-tgike6d6 --python-tag cp37: running bdist_wheel running build running build_py creating build creating build\lib.win-amd64-3.7 creating build\lib.win-amd64-3.7\btm copying btm__init__.py -> build\lib.win-amd64-3.7\btm running build_ext building 'btm_cpp' extension creating build\temp.win-amd64-3.7 creating build\temp.win-amd64-3.7\Release creating build\temp.win-amd64-3.7\Release\btm C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -DMAJOR_VERSION=1 -DMINOR_VERSION=0 -IC:\Users\07390\Anaconda3\include -IC:\Users\07390\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpbtm/model.cpp /Fobuild\temp.win-amd64-3.7\Release\btm/model.obj -std=c++11 cl: 命令行 warning D9002 :忽略未知选项“-std=c++11” model.cpp C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\btm\doc.h(24): warning C4267: “return”: 从“size_t”转换到“int”,可能丢失数 据 C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\btm\model.h(10): fatal error C1083: 无法打开包括文件: “boost/python/numpy.hpp”: No such file or directory error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe' failed with exit status 2
Failed building wheel for btm Running setup.py clean for btm Failed to build btm Installing collected packages: btm Found existing installation: btm 1.0.15 Uninstalling btm-1.0.15: Successfully uninstalled btm-1.0.15 Running setup.py install for btm ... error Complete output from command C:\Users\07390\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\07390\AppData\Local\Temp\pip-record-bydnsmqf\install-record.txt --single-version-externally-managed --compile: running install running build running build_py creating build creating build\lib.win-amd64-3.7 creating build\lib.win-amd64-3.7\btm copying btm__init__.py -> build\lib.win-amd64-3.7\btm running build_ext building 'btm_cpp' extension creating build\temp.win-amd64-3.7 creating build\temp.win-amd64-3.7\Release creating build\temp.win-amd64-3.7\Release\btm C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MT -DMAJOR_VERSION=1 -DMINOR_VERSION=0 -IC:\Users\07390\Anaconda3\include -IC:\Users\07390\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpbtm/model.cpp /Fobuild\temp.win-amd64-3.7\Release\btm/model.obj -std=c++11 cl: 命令行 warning D9002 :忽略未知选项“-std=c++11” model.cpp C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\btm\doc.h(24): warning C4267: “return”: 从“size_t”转换到“int”,可能丢失 数据 C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\btm\model.h(10): fatal error C1083: 无法打开包括文件: “boost/python/numpy.hpp”: No such file or directory error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe' failed with exit status 2
----------------------------------------
Rolling back uninstall of btm Command "C:\Users\07390\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\07390\AppData\Local\Temp\pip-record-bydnsmqf\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\07390\AppData\Local\Temp\pip-req-build-odmu1lp8\
Not sure what is wrong.
For what I see The compiler can't find the boost numpy headers
...model.h(10): fatal error C1083: 无法打开包括文件: “boost/python/numpy.hpp”: No such file or directory
Do you have boost correctly installed? And did you add the headers path to the include path dir?
Hi!
I installed boost, but I do not know how to add the header path to the include path directory.
So I tried to install boost using anaconda, and again it does not work:
(d2l) C:\train\B-Python>pip install . Processing c:\train\b-python Building wheels for collected packages: btm Building wheel for btm (setup.py) ... error ERROR: Command errored out with exit status 1: command: 'C:\Users\07390\Anaconda3\envs\d2l\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\setup.py'"'"'; file='"'"'C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\07390\AppData\Local\Temp\pip-wheel-4iss4_6i' --python-tag cp37 cwd: C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\ Complete output (18 lines): running bdist_wheel running build running build_py creating build creating build\lib.win-amd64-3.7 creating build\lib.win-amd64-3.7\btm copying btm__init__.py -> build\lib.win-amd64-3.7\btm running build_ext building 'btm_cpp' extension creating build\temp.win-amd64-3.7 creating build\temp.win-amd64-3.7\Release creating build\temp.win-amd64-3.7\Release\btm C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DMAJOR_VERSION=1 -DMINOR_VERSION=0 -IC:\Users\07390\Anaconda3\envs\d2l\include -IC:\Users\07390\Anaconda3\envs\d2l\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpbtm/model.cpp /Fobuild\temp.win-amd64-3.7\Release\btm/model.obj -std=c++11 cl: 命令行 warning D9002 :忽略未知选项“-std=c++11” model.cpp C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\btm\doc.h(24): warning C4267: “return”: 从“size_t”转换到“int”,可能丢失数 据 C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\btm\model.h(10): fatal error C1083: 无法打开包括文件: “boost/python/numpy.hpp”: No such file or directory error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe' failed with exit status 2
ERROR: Failed building wheel for btm Running setup.py clean for btm Failed to build btm Installing collected packages: btm Found existing installation: btm 1.0.15 Uninstalling btm-1.0.15: Successfully uninstalled btm-1.0.15 Running setup.py install for btm ... error ERROR: Command errored out with exit status 1: command: 'C:\Users\07390\Anaconda3\envs\d2l\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\setup.py'"'"'; file='"'"'C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\07390\AppData\Local\Temp\pip-record-hlpib9u3\install-record.txt' --single-version-externally-managed --compile cwd: C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\ Complete output (18 lines): running install running build running build_py creating build creating build\lib.win-amd64-3.7 creating build\lib.win-amd64-3.7\btm copying btm__init__.py -> build\lib.win-amd64-3.7\btm running build_ext building 'btm_cpp' extension creating build\temp.win-amd64-3.7 creating build\temp.win-amd64-3.7\Release creating build\temp.win-amd64-3.7\Release\btm C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DMAJOR_VERSION=1 -DMINOR_VERSION=0 -IC:\Users\07390\Anaconda3\envs\d2l\include -IC:\Users\07390\Anaconda3\envs\d2l\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpbtm/model.cpp /Fobuild\temp.win-amd64-3.7\Release\btm/model.obj -std=c++11 cl: 命令行 warning D9002 :忽略未知选项“-std=c++11” model.cpp C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\btm\doc.h(24): warning C4267: “return”: 从“size_t”转换到“int”,可能丢失 数据 C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\btm\model.h(10): fatal error C1083: 无法打开包括文件: “boost/python/numpy.hpp”: No such file or directory error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe' failed with exit status 2
Rolling back uninstall of btm Moving to c:\users\07390\anaconda3\envs\d2l\lib\site-packages\btm-1.0.15.dist-info\ from c:\users\07390\anaconda3\envs\d2l\lib\site-packages\~tm-1.0.15.dist-info Moving to c:\users\07390\anaconda3\envs\d2l\lib\site-packages\btm\ from c:\users\07390\anaconda3\envs\d2l\lib\site-packages\~tm ERROR: Command errored out with exit status 1: 'C:\Users\07390\Anaconda3\envs\d2l\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\setup.py'"'"'; file='"'"'C:\Users\07390\AppData\Local\Temp\pip-req-build-fq1ue1v3\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\07390\AppData\Local\Temp\pip-record-hlpib9u3\install-record.txt' --single-version-externally-managed --compile Check the logs for full command output.
I have also included the directory of boost in the system path variable.
The include path are the paths where the compiler looks for headers file (the .h files). It is not related to the system path which are the paths where the operating system looks for executables. I will try to add a configuration file to specify these paths and make the compilation easier.
In the meantime you can edit the setup.py and add it yourself.
btm_cpp = Extension('btm_cpp',
define_macros = [('MAJOR_VERSION', '1'),
('MINOR_VERSION', '0')],
libraries = ['boost_python3', 'boost_numpy3'],
language='c++11',
+ include_dirs=[ THE_PATH_WHERE_THE_BOOST_HEADERS_ARE_LOCATED ],
+ library_dirs=[ THE_PATH_WHERE_THE_BOOST_LIBRARIES_ARE_LOCATED],
extra_compile_args=extra_compile_args,
sources = ['btm/model.cpp','btm/infer.cpp'])
The THE_PATH_WHERE_THE_BOOST_HEADERS_ARE_LOCATED should end with an include, i.e.
C:\something\something\include
The THE_PATH_WHERE_THE_LIBRARIES_ARE_LOCATED perhaps end with a bin, i.e.
'C:\something\something\bin'. It should be a folder with a lot of dll
Depending on how boost was installed you probably will need to change the name of the libraries [''boost_numpy3', 'boost_python3']
This names make references to library files (in this case .dll files). For example, In the case of boost_numpy3
the last step of the compilation (the linker) will search for libboost_numpy3.dll
, perhaps in your machine the file is called libboost_numpy.dll
and you should change the libraries in setup.py to'boost_numpy'
Hi!
Thanks. I have a question:
In your set up:
btm_model = btm.Model(number_of_topics, alpha, beta, n_iters, 3, True)
What does 3 mean at here? Should not all the parameters be fixed already?
Hi! It is a parameter that does nothing :S. Is what it was the save_step in the original code. But in my fork nothing is saved in intermediate iterations.
Hi!
Thanks for the prompt reply. Wish you are safe!
I am giving a try with this on a large data set. One question - is it possible for this to be displaying progressing bars like tqdm? So far I am not seeing any indicator at all. Since training a large model takes a lot of time, I feel this could be useful.
That's odd. The progress bar is the same that in the original code, I can see it.
I just pushed a few commits removing the save_step
parameter and add a boolean show_progressbar
to make the progress bar optional. Because previously the progress bar was always present.
Also now is also possible to do this:
btm_model = btm.Model(number_of_topics, alpha, beta, n_iters, background_topic, show_progressbar)
btm_model.initialize(["sentence 1", "sentence 2", "sentence 2"])
for j in range(500):
btm_model.fit_step()
To perform the fit steps in python. The fit_step
performs only one pass of the algorithm.
It is wierd. Here is a public ipynb file:
https://colab.research.google.com/drive/1Rr2WsY7MRy3Pin8Eak9HNa6rddBLSn07
I tried your commands but it says
NameError Traceback (most recent call last)
in () ----> 1 get_ipython().run_cell_magic('time', '', '\nnumber_of_topics = 2\nalpha = 50/2\nbeta = 0.0005\nn_iters = 50000\nbtm_model = btm.Model(number_of_topics, alpha, beta, n_iters, background_topic, show_progressbar)\nbtm_model.initialize(["sentence 1", "sentence 2", "sentence 2"])\nfor j in range(500):\n btm_model.fit_step()') 2 frames in time(self, line, cell, local_ns) /usr/local/lib/python3.6/dist-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns) 1191 else: 1192 st = clock2() -> 1193 exec(code, glob, local_ns) 1194 end = clock2() 1195 out = None in () NameError: name 'background_topic' is not defined
I am training using Google colab, not windows. So theoretically the issue should be from Google colab.
You did not define the background_topic variable. Follow the readme thoughtfully.
I run it in google colab and is working
Thanks! I figured out how to use it now. The second method works for me.
Quick question: Is it possible to speed up the training using GPU/TPU? I know it uses Gibbs sampling in the background. Just wondering if we can speed up the training process since colab offer GPU/TPU support.
when i run the example code above, i got something like this:
ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 1 of 50001 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 ERR: index=3, size=3 2 of 50001
Is it the expected result or not?
@Logos23333 I've encountered the same problem. And I found out that this is caused by the following line of code
this->w2id[w] = this->w2id.size();
in line 118 in model.cpp. For example, when this->w2id is empty, i.e., its size is 0, the above code will assign this->w2id[w] to 1. That is. the resultant ids of the words are one greater than the expected ids, which causes the index out of boundary error. However, since I am not too familiar with c++, I am not sure why I run into this. The line of code can be changed to the following to avoid the error:
int new_id = this->w2id.size();
this->w2id[w] = new_id;
In case is anyone interested. I've made a python extension out of this code. It is more or less the same code, except it is wrapped with python-boost. And it avoids all the intermediate files. You can use it something like this: