Providing unique identifier of the content relative to a site

itteco / iframely

oEmbed proxy. Supports over 1800 domains via custom parsers, oEmbed, Twitter Cards and Open Graph

https://iframely.com

Other

1.52k stars 300 forks source link

Providing unique identifier of the content relative to a site #75

Open matuszeman opened 9 years ago

matuszeman commented 9 years ago

User story: As API user, I want to get unique ID of the content relative to a site.

Examples https://www.youtube.com/watch?v=XOmwZopzcTA&index=9&list=PL53194065BA276ACA ID: XOmwZopzcTA

https://soundcloud.com/jackedradio/afrojack-presents-jacked-radio-week-23 ID: 210140747

For a site where it's not possible to recognize system ID it would be based on a value from URL: https://soundcloud.com/jackedradio/afrojack-presents-jacked-radio-week-23 ID: jackedradio/afrojack-presents-jacked-radio-week-23

What do you think?

j0k3r commented 9 years ago

I like that !

iparamonau commented 9 years ago

Interesting. What would the use case for that feature be? Also, if we do it, say, for YouTube videos, what would should be returned for YouTube user profile pages?

... We have canonical URL in the response, I trust most people agreed to use it as identifier of the resource on the web, no?

matuszeman commented 9 years ago

Yes, canonical URLs seemed to be just what I needed but as I think about this feature more, it could be probably renamed to "Providing unique identifier of PRIMARY content relative to a site".

Example: https://www.youtube.com/watch?v=XOmwZopzcTA - represents a video page https://www.youtube.com/watch?v=XOmwZopzcTA&list=PL53194065BA276ACA&index=9 - represents exactly same video page in a playlist. I understand that both URLs above are just right as canonical URLs. Latter one is video in context of a playlist.

My use case is: User provides an URL, my app should be able to check if "primary content" reference does exist in my DB or not. Because of this, my idea was to use pair: site name and unique ID relative to that site.

iparamonau commented 9 years ago

We tried to find a better answer to this use case for 3 years. It pops up every couple months in one form or another. No luck so far.

Here's to show you the problem. Even for YouTube, if we give you ID for the video, it will be the same for two URLs: ?v=... and ?v=...&t=... - a timed embed, which would have a different embed code. For Google Maps it would be zoom levels, etc.

That just shows you can not trust IDs. And even canonical addresses, as actual URL context is essential for embeds. Besides, for short links (say, Bitly), it will be faster if you just let Iframely complete the processing than returning a re-direct to your app. Facebook does cache by og:url or canonical, but it comes at a cost of slower processing times.

We ended up making a decision that caching by exact URL will cover 99% of our use cases, and that it is good enough for us. At least for now.

matuszeman commented 9 years ago

That's actually what I'm after ... I want to be able to identify what content (uniquely identified per site) users share according an URL they provide. In my use case I don't care about zoom level nor time information - it just about identifying the primary content itself what in case of youtube video could be video ID or any unique identifier for such entity on the site.

I'm new to iframely, but I checked https://github.com/itteco/iframely/blob/master/plugins/domains/youtube.com/youtube.video.js and it seems like it would be quite easy to provide this information from what we have already available there. Is there a documentation which I could use to learn more and maybe experiment a bit and contribute with a plugin?

iparamonau commented 9 years ago

For stats aggregation - I see your point. As for caching it doesn't make sense: you would still need to make a request to Iframely to get this ID.

Even for stats, canonical would be a better and more universal source. You could take a hash of it for better indexing. With "canonical" I mean meta.canonical that is returned in Iframely JSON, or oembed.url, as ideally it is the same for same video. Not the actual URL you send to APIs.

Now the problem with our YouTube plugin we have is that it doesn't give canonical address at all. We will be fixing it soon as well as making sure all other plugins give consistent response.

If you experiment with it in the meantime, you could check this unfinished doc on how to write plugins.

nleush commented 9 years ago

@matuszeman

You can add

getMeta: function(...) {
    return {ID:'...'};
}

for any plugin.

And result data will contain 'ID' in 'meta' section of response from that plugin.