google / gdata-java-client

Automatically exported from code.google.com/p/gdata-java-client
Apache License 2.0
135 stars 111 forks source link

Create utilities to construct canonical feed URLs (or parameters) from end-user-browser URLs #53

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Given an arbitrary Picasaweb or Youtube URL, there is no way to reliably
construct a canonical feed URL that can be used with the gdata API to
retrieve metadata about that object. Currently, a developer must "scrape"
the URL and infer various parameters from the URL.

What version of the product are you using? On what operating system?

 gdata-1.16.4

Please provide any additional information below.

 See
http://groups.google.com/group/google-help-dataapi/browse_frm/thread/21e67a23c7c
2b704

Original issue reported on code.google.com by rob%xoop...@gtempaccount.com on 30 May 2008 at 3:16

GoogleCodeExporter commented 9 years ago
Could you provide a simple example of url that you are trying to scrape (and 
parts
that you are interested in).

Original comment by vbarat...@gmail.com on 30 May 2008 at 3:33

GoogleCodeExporter commented 9 years ago
Some Picasaweb examples:

 * http://picasaweb.google.com/beussery/GoogleIOAfterHoursAtGoogleIO - an album
slideshow - would like to know that

 *
http://picasaweb.google.com/beussery/GoogleIOAfterHoursAtGoogleIO/photo#s5205865
116380613858
- an album slideshow starting at a particular photo

 *
http://picasaweb.google.com/beussery/GoogleIOAfterHoursAtGoogleIO/photo#52058651
24970548466
- a direct link to a single photo

 *
http://picasaweb.google.com/lh/idredir?uname=beussery&target=PHOTO&id=5205865116
380613858
- resolves to another direct link to the same photo

For most it is simple to scrape the URL and figure out the username, albumname, 
and
possibly photoid.

For others, it would be nice to have a catalog of the different possible kinds 
of
URLs that might be used by a browser to reference an object, such as the last 
"lh"
redirect example; it would be nice to know what kinds of URLs might actually
reference an actual object, and which ones that happen to be in 
picasaweb.google.com
don't actually refer to a user/album/photo at all.

I have also seen the following two forms of URLs refer to the same object:

 * http://picasaweb.google.com/username/albumname?authkey=[authkey]/photo#[photoid]
 * http://picasaweb.google.com/username/albumname/photo?authkey=[authkey]#[photoid]

So given a URL, the output of such a utility might yield:

 * username, if applicable
 * albumname, if applicable
 * photoid, if applicable
 * authkey, if applicable
 * flags, such as "slideshow" mode
 * anything else interesting to that service
 * as a convenience, perhaps the appropriate feed URL for that object

Some Youtube examples that point to the same video:

 * http://www.youtube.com/watch?v=G7-rJ4uoS5I - These often have "&feature=user"
appended as a query parameter, but I don't see any effect this has on the 
viewed video.

 * http://www.youtube.com/v/G7-rJ4uoS5I - some kind of direct link to
full-browser-screen version of the video (redirects to
http://www.youtube.com/swf/l.swf?video_id=G7-rJ4uoS5I&rel=1&eurl=&iurl=http%3A//
i.ytimg.com/vi/G7-rJ4uoS5I/default.jpg&t=OEgsToPDskJjqguwYZGPHCj9izQanoub
)

 * perhaps some magic flags for auto-play?

Once all these parameters have been canonically scraped out of some "wild" URL, 
one
might then feed them back into an API such as that described at
http://code.google.com/p/gdata-java-client/issues/detail?id=11 to construct the 
feed
URL to reliably obtain more of that object's metadata.

These parameters might also be used to determine if two "wild" URLs actually 
refer to
the same object (but perhaps presented in different ways; e.g., slideshow vs. 
gallery
view of a Picasaweb album).

Original comment by rob%xoop...@gtempaccount.com on 30 May 2008 at 4:01

GoogleCodeExporter commented 9 years ago
Internal Tracking ID: 1697283

Original comment by yanivin...@gmail.com on 6 Mar 2009 at 11:55