Open b-meson opened 8 years ago
possible link https://gist.github.com/ralphbean/9966896
worth exploring, has a python pip module https://github.com/alexis-mignon/python-flickr-api/wiki/Tutorial
Currently there seems to be one way to do this in bulk (without a dedicated application or API). Open all the pages for a group or individual, scroll all the way down for the JS to render and then in the web inspector you can expand all of the HTML form and use a combination of awk / grep / sort / cut / uniq /
to grab the relative path a picture: something like /photos/photoid
. Then you can combine that and open flickr.com/photos/photoid
and from that html you can find the src-id
for the full resolution photo. I have been trying a combination of this plus curl
and haven't had much success. Its likely we need a programmatic way of doing this (like an API) or use pythons robobrowser
to get around some of these limitations
selenium might be the answer, as much as i hate it, if we can't actually get the API working.
I will try a bit this week. Do you want to take a crack at it as well @r4v5 and @JoshuaOpolko ?
I noticed that the credentials I posted in Slack channel are authentication and secret keys but we might actually be missing the API key (i believe that is separate) which might be why I was failing hard. I'm also more hopeful about selenium webdriver that I was a few days ago.
I've created a script to obtain photos and details via the Flickr API and used it to mine 2 CPD Flickr groups so far. It retrieves the highest resolution image for each entry as well as the title, description and similar metadata available through the API. The run time for retrieving an entire group photos/details (approx 2000-5000) is generally a couple of hours due to rate limits. Next I'll be looking at analyzing the title/descriptions to see if it's possible to obtain names or probable names in at least some cases automatically. The output files have been added to github under cpd_pictures. More groups will be added as needed.
I just want to add that the timing hasn't been tuned that much yet so it may be possible to run the collection somewhat more quickly.
There are a lot of high-quality photos with very visible names and badge numbers are on Flickr. Some groups worth initial scrape