sybrenstuvel / flickrapi

Python Flickr API implementation
https://stuvel.eu/flickrapi
Other
155 stars 33 forks source link

Only a subset of images can be scrabed #112

Closed hsp closed 6 years ago

hsp commented 6 years ago

I don't have much experience in the Flickr API. I have created a python script (inserted below). Getting through the authorization works fine, but when downloading (even for a very small geographical area) returns fewer than are visible when accessed directly via the Flickr homepage. 12 images are returned by the API whereas 20 are visible via direct access. See images below.

What have I done wrong?

scrapedviaapi_12pics Fig1: Points/images downloaded by the API

scarapedviaflickr_20pics Fig2: Points/images as seen on the homepage

Script: c = 0 cc = 0 ccc = 0 delim = ";" LLX = 11.993 LLY = 55.686 URX = 12.02 URY = 55.69 bnd = str(LLX) + ',' + str(LLY) + ',' + str(URX) + ',' + str(URY) outPath = u"C:\Temp\".replace("\", "/") outFileName = "flickrOut.txt" print(outPath + outFileName) outHdl = open(outPath + outFileName, "w", encoding="utf-8") outHdl.write("Title" + delim + "Long" + delim + "Lat" + delim + "URL" + "\n") photos = flickr.walk(tag_mode='all',bbox=bnd, accuracy=16, extras="geo,url_c,url_m,date_taken,owner_name") for photo in photos: if photo.get('title') == None: title = "NoTitle" else: title = repr(photo.get('title')) try: outHdl.write(title + delim + photo.get('longitude') + delim + photo.get('latitude') + delim + photo.get("url_m") + "\n") except: ccc += 1 try: print(title) except: print("Title not found!!") print(photo.get('longitude')) print(photo.get('latitude')) print(photo.get("url_m"))

print(photo.get('title'), photo.get('longitude'), photo.get('latitude'), photo.get("url_m"))

c += 1
if photo.get('longitude') != 0:
    cc += 1
if c % 1000 == 0:
    print(str(c) + " Photos found in BBox: " + bnd)

print("Done. " + str(c) + " Photos found in BBox: " + bnd + ". " + str(cc) + " with XY locations. " + str(ccc) + " images could not be identified.") outHdl.close()

sybrenstuvel commented 6 years ago

Please format your issue correctly, so that the code indentation is not lost.

hsp commented 6 years ago

Sorry. Didn't see you response. I'll keep more awake from now on :-) I cannot figure out how to include formatted texts (incl. indents), sp please find enclosed a zip with en original code. OpenTheFlickrApi_HSP_ver3.zip Thanks /Hans

sybrenstuvel commented 6 years ago

I cannot figure out how to include formatted texts

You can find that with the Styling with Markdown is supported link underneath the comment box.

Which version of Python are you using? Your code looks Python 2-ish (with the u"" strings). I really recommend dropping Python 2 and using Python 3, unless you have a very good reason not to.

u"C:\Data\Dropbox\projects\\NationalparkPlanningDenmark\\flickr\\".replace("\\", "/")

Don't mix single and double backslashes. Just use raw strings (r"") and single backslashes.

outHdl.write("Title" + delim + "Long" + delim + "Lat" + delim + "URL" + "\n")

Just use the csv module to write CSV files. It's less of a hassle and less error-prone.

except:

Never, ever use a bare except clause. Read No exception type(s) specified for more info.

The code is quite messy, so it's hard to figure out what's happening. I would suggest you perform the same search query via the API web interface and compare the parameters and the results with your own code's parameters and results.

hsp commented 6 years ago

Ok. Thanks for your comments. Sorry for the lack of quality in my coding. It wasn't really meant for the public domain :-)

I have made a controlled case now. 1) I ran my app with the bbox set to 11.9958,55.6752,12.0208,55.6912 2) I did the same, using the API web interface (with no other setting than the bbox)

Both resulted in identifying 14 pictures (so it seems (at least( that my humble code does the job)

Then I 3) Opened to the native Flicker web page (https://www.flickr.com/map), zoomed in to the same area and hit the 'Search the map' button, which resulted in 27 geotagged images.... ;-)

Have a look at the two images attached. The one with the red frame is the one displaying the 14 images found by my app (and probably also by the API interface) (Roskilde_MyFlickrScraper.JPG), The one without (Roskilde_FlickrsWeb.JPG), is what apreard on flickr.com.

Obviously several images appear (on flickr.com) inside the area of interest (indicated by the red square).

roskilde_myflickrscraper

roskilde_flickrsweb

Cheers Hans

sybrenstuvel commented 6 years ago

Sorry for the lack of quality in my coding. It wasn't really meant for the public domain :-)

If you want someone else to help you out for free, it helps to make life easy and pleasant for that someone ;-)

Both resulted in identifying 14 pictures (so it seems (at least( that my humble code does the job)

In that case the mystery is at the side of Flickr, and I won't be able to help you further. You'll have to figure out how to get more results (if they allow that at all via their API).