whchien / funda-scraper

FundaScaper scrapes data from Funda, the Dutch housing website. You can find listings from house-buying or rental market, and historical data. 🏡
GNU General Public License v3.0
104 stars 48 forks source link

Default example from documentation returns #23

Closed thijs-hakkenberg closed 6 months ago

thijs-hakkenberg commented 11 months ago

The getting started documentation code:

from funda_scraper import FundaScraper

scraper = FundaScraper(area="amsterdam", want_to="rent", find_past=False, page_start=1, n_pages=3)
df = scraper.run(raw_data=False, save=True, filepath="test.csv", min_price=500, max_price=2000)
df.head()

returns: Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 120, in spawn_main exitcode = _main(fd, parent_sentinel) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 129, in _main prepare(preparation_data) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 240, in prepare _fixup_main_from_path(data['init_main_from_path']) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 291, in _fixup_main_from_path main_content = runpy.run_path(main_path, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 291, in run_path File "", line 98, in _run_module_code File "", line 88, in _run_code File "/Users/username/PycharmProjects/HouseWatcher/main.py", line 9, in df = scraper.run(raw_data=True, save=True, filepath="test.csv") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/username/PycharmProjects/HouseWatcher/venv/lib/python3.11/site-packages/funda_scraper/scrape.py", line 280, in run self.scrape_pages() File "/Users/username/PycharmProjects/HouseWatcher/venv/lib/python3.11/site-packages/funda_scraper/scrape.py", line 245, in scrape_pages content = process_map(self.scrape_one_link, self.links, max_workers=pools) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/username/PycharmProjects/HouseWatcher/venv/lib/python3.11/site-packages/tqdm/contrib/concurrent.py", line 105, in process_map return _executor_map(ProcessPoolExecutor, fn, *iterables, *tqdm_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/username/PycharmProjects/HouseWatcher/venv/lib/python3.11/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map return list(tqdm_class(ex.map(fn, iterables, chunksize=chunksize), *kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 811, in map results = super().map(partial(_process_chunk, fn), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 608, in map fs = [self.submit(fn, args) for args in zip(iterables)] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 608, in fs = [self.submit(fn, args) for args in zip(*iterables)] ^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 782, in submit self._adjust_process_count() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 741, in _adjust_process_count self._spawn_process() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 759, in _spawn_process p.start() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) ^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/context.py", line 288, in _Popen return Popen(process_obj) ^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 42, in _launch prep_data = spawn.get_preparation_data(process_obj._name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 158, in get_preparation_data _check_not_importing_main() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 138, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

0%| | 0/45 [00:03<?, ?it/s] Traceback (most recent call last): File "/Users/username/PycharmProjects/HouseWatcher/main.py", line 9, in df = scraper.run(raw_data=True, save=True, filepath="test.csv") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/username/PycharmProjects/HouseWatcher/venv/lib/python3.11/site-packages/funda_scraper/scrape.py", line 280, in run self.scrape_pages() File "/Users/username/PycharmProjects/HouseWatcher/venv/lib/python3.11/site-packages/funda_scraper/scrape.py", line 245, in scrape_pages content = process_map(self.scrape_one_link, self.links, max_workers=pools) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/username/PycharmProjects/HouseWatcher/venv/lib/python3.11/site-packages/tqdm/contrib/concurrent.py", line 105, in process_map return _executor_map(ProcessPoolExecutor, fn, *iterables, *tqdm_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/username/PycharmProjects/HouseWatcher/venv/lib/python3.11/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map return list(tqdm_class(ex.map(fn, iterables, chunksize=chunksize), **kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/username/PycharmProjects/HouseWatcher/venv/lib/python3.11/site-packages/tqdm/std.py", line 1182, in iter for obj in iterable: File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 597, in _chain_from_iterable_of_lists for element in iterable: File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 619, in result_iterator yield _result_or_cancel(fs.pop()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 317, in _result_or_cancel return fut.result(timeout) ^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

blackfireagent commented 8 months ago

If you are using windows, please add "if name == 'main':" before your script.

whchien commented 6 months ago

hi @thijs-hakkenberg @blackfireagent thanks for the feedbacks. I've added a note to the README.