urbaninformaticsandresiliencelab / gmaps_scraper

0 stars 0 forks source link

I always get this error : Please specify a valid state with --state. See --help for more info. Possible states: #1

Open Gwojda opened 3 years ago

Gwojda commented 3 years ago

hi, i always get this error trying to use your tools : Please specify a valid state with --state. See --help for more info. Possible states: ....

even if i try with your exemple : python3 -m gmaps_scraper --type places_nearby --city Boston --state Massachusetts or with a file : python3 -m gmaps_scraper --type places_nearby --city Boston --state tl_2016_69_place.cpg

thank's

ercas commented 3 years ago

Hi, I forgot to include a section of the setup where users should create a directory called tiger-2016 and download 2016 TIGER/Line shapefiles from the U.S. census website into that directory. See here:

https://github.com/urbaninformaticsandresiliencelab/gmaps_scraper/blob/548147b08108764b6895cfba1f5521e7dc3239bd/gmaps_scraper/__main__.py#L114 https://github.com/urbaninformaticsandresiliencelab/gmaps_scraper/blob/master/gmaps_scraper/__main__.py#L329

So, if you are scraping Massachusetts, create a directory tiger-2016/Massachusetts and dump the .shp, .prj, .dbf etc. in that folder. Thanks for pointing this out! Will add it to the README in the future.

Please let me know if this solves your issue.

ercas commented 3 years ago

Hi, apparently I did add this to the setup: https://github.com/urbaninformaticsandresiliencelab/gmaps_scraper#setup-1

The relevant script you need to run is here: https://github.com/urbaninformaticsandresiliencelab/gmaps_scraper/blob/master/util/scrape-tiger.sh

Gwojda commented 3 years ago

thanks for your reply

i already ran scrape-tiger.sh, which created me 2 directory : tiger-2016 and tiger-2016-src. after that, i ran python3 -m gmaps_scraper --type places_nearby --city Boston --state Massachusetts in root directory. I didn't works. with : FileNotFoundError: [Errno 2] No such file or directory: 'tiger-2016/' error. So, i created a symlink : ln -s util/tiger-2016 tiger-2016 and : error Please specify a valid state with --state. See --help for more info. Possible states: is still here.

Gwojda commented 3 years ago

i tricked with : ➜ a git:(master) python3 -m gmaps_scraper --type places_nearby --city 'Lewis' --state '' ->

Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.9/3.9.0_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/Cellar/python@3.9/3.9.0_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/private/tmp/a/gmaps_scraper/__main__.py", line 381, in <module>
    main()
  File "/private/tmp/a/gmaps_scraper/__main__.py", line 377, in main
    scrape_subdivisions(options)
  File "/private/tmp/a/gmaps_scraper/__main__.py", line 144, in scrape_subdivisions
    new_scraper = scrapers.PlacesNearbyScraper(
  File "/private/tmp/a/gmaps_scraper/scrapers.py", line 742, in __init__
    Scraper.__init__(self, *args, **kwargs)
TypeError: __init__() missing 1 required positional argument: 'gmaps'

-> looks like this is not only one problem ^^

ercas commented 3 years ago

Hi, for some reason I am not able to reproduce the --state issue and it seems to be working on my end - just cloned the repository now and tried:

$ cd util/
$ ls
create_json_parallel_redis.py  scrape-tiger.sh  tiger-2016-src
process_pickles.py             tiger-2016
$ python3 -m gmaps_scraper --type places_nearby --city Boston --state Massachusetts
RedisDuplicateChecker class unavailable; could not import  redis module.
MongoWriter class unavailable; could not import pymongo
PostgresWriter class unavailable; could not import psycopg2
Could not find credentials.py. One has been created for you.
Please provide an API key by adding it to credentials.py or by using the --api-key option.
Traceback (most recent call last):
  File "/home/leaf/miniconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/leaf/miniconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/leaf/miniconda3/lib/python3.7/site-packages/gmaps_scraper/__main__.py", line 381, in <module>
    main()
  File "/home/leaf/miniconda3/lib/python3.7/site-packages/gmaps_scraper/__main__.py", line 370, in main
    raise ValueError("No valid API key string given")
ValueError: No valid API key string given
$ 

As for the other issue: we switched from invoking directly via the command-line to invoking via script, and it looks like I never updated __main__.py to keep up - was looking just now and it looks like __main__.py is invoking an old version of PlacesNearbyScraper, PlacesRadarScraper, etc. sorry about that!

I will try to fix this in the future, but for now I suggest using some of the example scripts e.g. https://github.com/urbaninformaticsandresiliencelab/gmaps_scraper/blob/master/examples/continuous/main.py until I can get to that.

The relevant code is here:

https://github.com/urbaninformaticsandresiliencelab/gmaps_scraper/blob/master/examples/continuous/main.py#L40-L48

Where the third "argument" to scrape_subdivisions() is passing min_latitude, max_latitude, min_longitude, max_longitude - more info here: https://github.com/urbaninformaticsandresiliencelab/gmaps_scraper/blob/master/gmaps_scraper/scrapers.py#L467-L531 You can remove writer = "mongo" to have it write to pickle files.

Gwojda commented 3 years ago

hi, for the first issue, you have to run your cmd in /, not in /utils. (he dont find your credentials.py etc..)

Ok, i can run it with your script. Thanks ! I didn't let him run all the test because it's not free, but what kind of output does it give ? I would like to scrape all places from a particular zone. For example from a Country, get all McDonald's. Your util can do this ? Thanks !

ercas commented 3 years ago

Ah I see, thank you for pointing that out! I will be sure to update that once I fix the other issue.

The scraper dumps raw output in the format shown here: https://developers.google.com/places/web-service/search#nearby-search-and-text-search-responses

I would recommend also looking at HERE.com, as it sounds like they would also fit your use case and provide 200k free requests per month - that is what the lab switched to for future projects, I believe (no longer a part of it so I'm not sure of current projects).

Gwojda commented 3 years ago

Thanks ! I will need the places_id to find more data so i don't know if HERE.com can do this. i'll take a look !

Gwojda commented 3 years ago

Radius become very very small (900m) why do you need to do that ?

Gwojda commented 3 years ago

the MIN_RADIUS_METERS in scrapers.py dont change anything :/

ercas commented 3 years ago

Hi, so the best way to toggle minimum radius is to pass the argument min_radius in scraper instantiation. We included this because there are rare cases where an extremely large amount of points will be essentially on top of each other, so the scraper would recurse indefinitely.