linwoodc3 / gdeltPyR

Python based framework to retreive Global Database of Events, Language, and Tone (GDELT) version 1.0 and version 2.0 data.
https://linwoodc3.github.io/gdeltPyR/
GNU General Public License v3.0
203 stars 53 forks source link

BUG: Event search not working on windows 32 bit machine #45

Closed saint7007 closed 7 years ago

saint7007 commented 7 years ago

import gdelt import requests.packages.urllib3

requests.packages.urllib3.disable_warnings() import platform print(platform.architecture()) import gdelt

gd = gdelt.gdelt(version=2)

results = gd.Search(['2016 10 19','2016 10 22'],table='events',coverage=True) print(results)


output

D:\SUSHANT\pyt\python.exe C:/Users/sushant.s/PycharmProjects/testAGAIN/GDELT.py ('32bit', 'WindowsPE') ('32bit', 'WindowsPE') ('32bit', 'WindowsPE') Traceback (most recent call last): File "", line 1, in File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 114, in _main prepare(preparation_data) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 225, in prepare _fixup_main_from_path(data['init_main_from_path']) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="__mp_main") File "D:\SUSHANT\pyt\lib\runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "D:\SUSHANT\pyt\lib\runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "D:\SUSHANT\pyt\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\sushant.s\PycharmProjects\testAGAIN\GDELT.py", line 11, in results = gd.Search(['2016 10 19','2016 10 22'],table='events',coverage=True) File "D:\SUSHANT\pyt\lib\site-packages\gdelt\base.py", line 568, in Search pool = Pool(processes=cpu_count()) File "D:\SUSHANT\pyt\lib\multiprocessing\context.py", line 119, in Pool context=self.get_context()) File "D:\SUSHANT\pyt\lib\multiprocessing\pool.py", line 168, in init self._repopulate_pool() File "D:\SUSHANT\pyt\lib\multiprocessing\pool.py", line 233, in _repopulate_pool w.start() File "D:\SUSHANT\pyt\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self) File "D:\SUSHANT\pyt\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "D:\SUSHANT\pyt\lib\multiprocessing\popen_spawn_win32.py", line 33, in init__ prep_data = spawn.get_preparation_data(process_obj._name) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 143, in get_preparation_data _check_not_importing_main() File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main is not going to be frozen to produce an executable.''') RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last): File "", line 1, in File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 114, in _main prepare(preparation_data) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 225, in prepare _fixup_main_from_path(data['init_main_from_path']) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="__mp_main") File "D:\SUSHANT\pyt\lib\runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "D:\SUSHANT\pyt\lib\runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "D:\SUSHANT\pyt\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\sushant.s\PycharmProjects\testAGAIN\GDELT.py", line 11, in results = gd.Search(['2016 10 19','2016 10 22'],table='events',coverage=True) File "D:\SUSHANT\pyt\lib\site-packages\gdelt\base.py", line 568, in Search pool = Pool(processes=cpu_count()) File "D:\SUSHANT\pyt\lib\multiprocessing\context.py", line 119, in Pool context=self.get_context()) File "D:\SUSHANT\pyt\lib\multiprocessing\pool.py", line 168, in init self._repopulate_pool() File "D:\SUSHANT\pyt\lib\multiprocessing\pool.py", line 233, in _repopulate_pool w.start() File "D:\SUSHANT\pyt\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self) File "D:\SUSHANT\pyt\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "D:\SUSHANT\pyt\lib\multiprocessing\popen_spawn_win32.py", line 33, in init__ prep_data = spawn.get_preparation_data(process_obj._name) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 143, in get_preparation_data _check_not_importing_main() File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main is not going to be frozen to produce an executable.''') RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

('32bit', 'WindowsPE') ('32bit', 'WindowsPE')

linwoodc3 commented 7 years ago

@saint7007 , can you try to install using regular pip for me? I think the whl file is old:

pip install gdelt

Remove your previous install of gdelt.

linwoodc3 commented 7 years ago

@saint7007

I finally found a Window 32-bit machine to test this on. I installed and ran the query with no problems.

screen shot 2017-08-23 at 7 50 52 am

How did you install gdeltpyr? Did you try pip? What version of pandas do you have? I think that may be your issue. I was able to use a Windows PE 32-bit machine and return 400,000+ rows of data.

I've also confirmed gdeltPyr works on a Windows 64-bit machine as well. So, I'm unable to recreate your problem and may have to close this issue.

screen shot 2017-08-23 at 7 48 07 am

linwoodc3 commented 7 years ago

@saint7007 ; I'm taking the silence as "it's working". Plan to close this tomorrow.

saint7007 commented 7 years ago

updated panda version D:\SUSHANT\pyt\Scripts>pip install C:\Users\sushant.s\Downloads\pandas-0.20.3-cp 36-cp36m-win32.whl Processing c:\users\sushant.s\downloads\pandas-0.20.3-cp36-cp36m-win32.whl Requirement already satisfied: pytz>=2011k in d:\sushant\pyt\lib\site-packages ( from pandas==0.20.3) Requirement already satisfied: python-dateutil>=2 in d:\sushant\pyt\lib\site-pac kages (from pandas==0.20.3) Requirement already satisfied: numpy>=1.7.0 in d:\sushant\pyt\lib\site-packages (from pandas==0.20.3) Requirement already satisfied: six>=1.5 in d:\sushant\pyt\lib\site-packages (fro m python-dateutil>=2->pandas==0.20.3) Installing collected packages: pandas Found existing installation: pandas 0.20.1 Uninstalling pandas-0.20.1: Successfully uninstalled pandas-0.20.1 Successfully installed pandas-0.20.3

D:\SUSHANT\pyt\Scripts>


output still same..

D:\SUSHANT\pyt\python.exe C:/Users/sushant.s/PycharmProjects/testAGAIN/GDELT.py ('32bit', 'WindowsPE') ('32bit', 'WindowsPE') ('32bit', 'WindowsPE') Traceback (most recent call last): File "", line 1, in File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 114, in _main prepare(preparation_data) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 225, in prepare _fixup_main_from_path(data['init_main_from_path']) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="mp_main") File "D:\SUSHANT\pyt\lib\runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "D:\SUSHANT\pyt\lib\runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "D:\SUSHANT\pyt\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\sushant.s\PycharmProjects\testAGAIN\GDELT.py", line 11, in results = gd.Search(['2016 10 19','2016 10 22'],table='events',coverage=True) File "D:\SUSHANT\pyt\lib\site-packages\gdelt\base.py", line 568, in Search Traceback (most recent call last): File "", line 1, in File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 114, in _main prepare(preparation_data) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 225, in prepare _fixup_main_from_path(data['init_main_from_path']) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="mp_main") File "D:\SUSHANT\pyt\lib\runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "D:\SUSHANT\pyt\lib\runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "D:\SUSHANT\pyt\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\sushant.s\PycharmProjects\testAGAIN\GDELT.py", line 11, in results = gd.Search(['2016 10 19','2016 10 22'],table='events',coverage=True) File "D:\SUSHANT\pyt\lib\site-packages\gdelt\base.py", line 568, in Search pool = Pool(processes=cpu_count()) File "D:\SUSHANT\pyt\lib\multiprocessing\context.py", line 119, in Pool pool = Pool(processes=cpu_count()) File "D:\SUSHANT\pyt\lib\multiprocessing\context.py", line 119, in Pool context=self.get_context()) File "D:\SUSHANT\pyt\lib\multiprocessing\pool.py", line 168, in init context=self.get_context()) File "D:\SUSHANT\pyt\lib\multiprocessing\pool.py", line 168, in init self._repopulate_pool() File "D:\SUSHANT\pyt\lib\multiprocessing\pool.py", line 233, in _repopulate_pool w.start() File "D:\SUSHANT\pyt\lib\multiprocessing\process.py", line 105, in start self._repopulate_pool() File "D:\SUSHANT\pyt\lib\multiprocessing\pool.py", line 233, in _repopulate_pool w.start() File "D:\SUSHANT\pyt\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self) File "D:\SUSHANT\pyt\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "D:\SUSHANT\pyt\lib\multiprocessing\popen_spawn_win32.py", line 33, in init self._popen = self._Popen(self) File "D:\SUSHANT\pyt\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "D:\SUSHANT\pyt\lib\multiprocessing\popen_spawn_win32.py", line 33, in init prep_data = spawn.get_preparation_data(process_obj._name) File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 143, in get_preparation_data _check_not_importing_main() File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main is not going to be frozen to produce an executable.''') RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
prep_data = spawn.get_preparation_data(process_obj._name)

File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 143, in get_preparation_data _check_not_importing_main() File "D:\SUSHANT\pyt\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main is not going to be frozen to produce an executable.''') RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

('32bit', 'WindowsPE')


kindly note this works for,

pull single day, gkg table

results= gd1.Search('2016 Nov 01',table='gkg')

linwoodc3 commented 7 years ago

@saint7007 , I think a problem is your setup with Pycharm, which can sometimes require you to set your project's Python interpreter. Therefore, we need to keep this gdelt issue constrained to gdelt alone (isolate the problem). I will give you steps to follow exactly to get gdelt working on your computer. If you follow these exactly (don't use Pycharm), and let me know your results, I can help you better. Since you have a custom setup, it's impossible for me to know. Follow these steps, and we will be on the same page. Here are the steps to make sure gdelt works on your machine:

Here is a picture of my results: screen shot 2017-08-25 at 8 15 42 am

This returned nearly 900,000 rows of data for me. Remember, the more cores/processors you have, the faster the data pulls. Also remember, the more RAM you have, the more days you can pull. If you are on a machine with low RAM (4GBs) then you likely don't have enough memory to pull more than 2 days. GDELT consumes a LOT of space. If you only have 1 or 2 processors, the pulls take a long time (44 seconds).

linwoodc3 commented 7 years ago

I'm assuming you either got this fixed or moved on @saint7007 . Closing this. The comment above shows that gdeltPyR works on a Windows 32-bit machine. I think your errors are coming from your setup. Will reopen if you're still having issues after trying the instructions above.

saint7007 commented 7 years ago

I could not install using pip(proxy restriction). So didn't tried. Got it working for ubuntu 16.04 . Thanks for the pluggin.