Open GoogleCodeExporter opened 9 years ago
Could you provide a simple example of url that you are trying to scrape (and
parts
that you are interested in).
Original comment by vbarat...@gmail.com
on 30 May 2008 at 3:33
Some Picasaweb examples:
* http://picasaweb.google.com/beussery/GoogleIOAfterHoursAtGoogleIO - an album
slideshow - would like to know that
*
http://picasaweb.google.com/beussery/GoogleIOAfterHoursAtGoogleIO/photo#s5205865
116380613858
- an album slideshow starting at a particular photo
*
http://picasaweb.google.com/beussery/GoogleIOAfterHoursAtGoogleIO/photo#52058651
24970548466
- a direct link to a single photo
*
http://picasaweb.google.com/lh/idredir?uname=beussery&target=PHOTO&id=5205865116
380613858
- resolves to another direct link to the same photo
For most it is simple to scrape the URL and figure out the username, albumname,
and
possibly photoid.
For others, it would be nice to have a catalog of the different possible kinds
of
URLs that might be used by a browser to reference an object, such as the last
"lh"
redirect example; it would be nice to know what kinds of URLs might actually
reference an actual object, and which ones that happen to be in
picasaweb.google.com
don't actually refer to a user/album/photo at all.
I have also seen the following two forms of URLs refer to the same object:
* http://picasaweb.google.com/username/albumname?authkey=[authkey]/photo#[photoid]
* http://picasaweb.google.com/username/albumname/photo?authkey=[authkey]#[photoid]
So given a URL, the output of such a utility might yield:
* username, if applicable
* albumname, if applicable
* photoid, if applicable
* authkey, if applicable
* flags, such as "slideshow" mode
* anything else interesting to that service
* as a convenience, perhaps the appropriate feed URL for that object
Some Youtube examples that point to the same video:
* http://www.youtube.com/watch?v=G7-rJ4uoS5I - These often have "&feature=user"
appended as a query parameter, but I don't see any effect this has on the
viewed video.
* http://www.youtube.com/v/G7-rJ4uoS5I - some kind of direct link to
full-browser-screen version of the video (redirects to
http://www.youtube.com/swf/l.swf?video_id=G7-rJ4uoS5I&rel=1&eurl=&iurl=http%3A//
i.ytimg.com/vi/G7-rJ4uoS5I/default.jpg&t=OEgsToPDskJjqguwYZGPHCj9izQanoub
)
* perhaps some magic flags for auto-play?
Once all these parameters have been canonically scraped out of some "wild" URL,
one
might then feed them back into an API such as that described at
http://code.google.com/p/gdata-java-client/issues/detail?id=11 to construct the
feed
URL to reliably obtain more of that object's metadata.
These parameters might also be used to determine if two "wild" URLs actually
refer to
the same object (but perhaps presented in different ways; e.g., slideshow vs.
gallery
view of a Picasaweb album).
Original comment by rob%xoop...@gtempaccount.com
on 30 May 2008 at 4:01
Internal Tracking ID: 1697283
Original comment by yanivin...@gmail.com
on 6 Mar 2009 at 11:55
Original issue reported on code.google.com by
rob%xoop...@gtempaccount.com
on 30 May 2008 at 3:16