Dataset and Code for 2021 IEEE International Conference on Big Data Paper - Scraping Unstructured Data to Explore the Relationship between Rainfall Anomalies and Vector-Borne Disease Outbreaks
The scraper doesn't seem to work for python scrape_promed.py malaria
I tested Dengue and Zika and those seem to work.
Fetching results page 0
Fetching results page 1
...
Fetching results page 14
Finished parsing post #1
Finished parsing post #2
...
Finished parsing post #210
Finished parsing post #211
Traceback (most recent call last):
File "scrape_promed.py", line 80, in <module>
get_posts(search_term)
File "scrape_promed.py", line 75, in get_posts
for _ in executor.map(get_post, post_ids.items()):
File "C:\Users\Thomas\AppData\Local\Programs\Python\Python37-32\lib\concurrent\futures\_base.py", line 586, in result_iterator
yield fs.pop().result()
File "C:\Users\Thomas\AppData\Local\Programs\Python\Python37-32\lib\concurrent\futures\_base.py", line 425, in result
return self.__get_result()
File "C:\Users\Thomas\AppData\Local\Programs\Python\Python37-32\lib\concurrent\futures\_base.py", line 384, in __get_result
raise self._exception
File "C:\Users\Thomas\AppData\Local\Programs\Python\Python37-32\lib\concurrent\futures\thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "scrape_promed.py", line 69, in get_post
[r['postinfo'][x] for x in r['postinfo'].keys() if x in COLUMNS]]
File "C:\Users\Thomas\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\indexing.py", line 670, in __setitem__
iloc._setitem_with_indexer(indexer, value)
File "C:\Users\Thomas\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\indexing.py", line 1626, in _setitem_with_indexer
self._setitem_with_indexer_missing(indexer, value)
File "C:\Users\Thomas\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\indexing.py", line 1860, in _setitem_with_indexer_missing
self.obj._mgr = self.obj.append(value)._mgr
File "C:\Users\Thomas\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\frame.py", line 7751, in append
sort=sort,
File "C:\Users\Thomas\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\reshape\concat.py", line 287, in concat
return op.get_result()
File "C:\Users\Thomas\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\reshape\concat.py", line 503, in get_result
mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy,
File "C:\Users\Thomas\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\internals\concat.py", line 84, in concatenate_block_managers
return BlockManager(blocks, axes)
File "C:\Users\Thomas\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\internals\managers.py", line 149, in __init__
self._verify_integrity()
File "C:\Users\Thomas\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\internals\managers.py", line 329, in _verify_integrity
raise construction_error(tot_items, block.shape[1:], self.axes)
ValueError: Shape of passed values is (202, 21), indices imply (201, 21)
Hmm, I just ran the same command and it works for me. It might be a multithreading issue, so try setting max_workers=1 on line 74 and see if that helps.
The scraper doesn't seem to work for
python scrape_promed.py malaria
I tested Dengue and Zika and those seem to work.