[Meta] Standardizing the subscription export format between all YouTube projects

TheFrenchGhosty commented 2 years ago

This is a follow up to a discussion we had on Matrix.

By "all": The web frontends: Invidious/Piped/CloudTube and the "application" NewPipe/FreeTube is implied, but other projects are free to join in.

The Problem:

Currently (almost) all the YouTube front-ends/software have their own "standard" to export subscription... and (almost) all of them can also import the exports made by the others. This is just pointless, and make the code more complicated for no reason, it also causes some awful UX/UI design, as demonstrated in those screenshots:

Invidious:

invidious

FreeTube:

freetube

The solution:

All the projects should have one standard, that fits the needs of all of them. This standard should be text based, properly documented, and ideally, be both human and machine readable.

Implementation ideas:

This is, in short what has to be done:

Get all project interested in this idea (I don't see why a project wouldn't be, but if a project author isn't interested, please explain why)
Discuss a standard, figure out what each projects needs (eg: NewPipe needs to exports settings, FreeTube needs the channel thumbnail URL included)
Figure out how to handle things (eg: how do we handle settings: compatible between projects, or specific to each projects?)
Make a standard
Create a web application that convert all the existing exports to this standard
Implement this standard to every app (and link to this converter), remove every "previous" way to import/export subscriptions

Ping to each projects developers :

Invidious: @SamantazFox @unixfox FreeTube: @PrestonN CloudTube: @cloudrac3r NewPipe: @TobiGr @TiA4f8R @B0pol @Stypox @TheAssassin (note: this list isn't exhaustive, I didn't know who to ping) Piped: @FireMasterK

SamantazFox commented 2 years ago

Here is my first proposal for such a format:

data to be exported: The bare minimum export would contain:

the format's version (in the event where we need it to evolve)
Subscribed channels
Watch history
User-created Playlists

Optional content:

Subscribed/saved playlists
Custom categories (channel lists, playlist lists) => maybe?
user settings => maybe try to unify those, too (like preferred nitter/bibliogram instance)
reserved sections for each app/frontend

format: We should probably stay with JSON. It's easy to parse, quite legible and most projects already use that (FreeTube, NewPipe, Piped and us, Invidious). Not minified, though.

Imo, what NewPipe/Freetube/Piped are using for the subscriptions can almost be re-used as-is.

Watch history and "watch later" would probably be listed under playlists, but with specific treatment?

Here's a sample

{
    "version": 1,
    "subscriptions": [
        {
            "name": "Linus Tech Tips",
            "url": "https://www.youtube.com/channel/UCXuqSBlHAE6Xw-yeJA0Tunw",
            "thumbnail": "https://yt3.ggpht.com/ytc/[snip]"
        }
    ],
    "playlists": [
        {
            "name": "__watch_history__",
            "videos": []
        },
        {
            "name": "__watch_later__",
            "videos": []
        },
        {
            "name": "My fav Music!",
            "visibility": "public",
            "description": "The playlist I listen to every day",
            "videos": [
                "djV11Xbc914",
                "HEXWRTEbj1I",
                "sRl02nXVMhw",
                "dQw4w9WgXcQ",
                "Hy8kmNEo1i8",
                "zA52uNzx7Y4",
                "9bZkp7q19f0"
            ]
        },
    ]
}

TheAssassin commented 2 years ago

I recommend creating a distinct repository for maintaining such a specification. It makes organizing everything a lot easier.

TheFrenchGhosty commented 2 years ago

@TheAssassin That's indeed the plan when the idea is a bit more developed.

cloudrac3r commented 2 years ago

Here's the data that CloudTube may currently store about a particular "account":

Subscriptions
- an array of channel IDs
Watched videos
- an array of video IDs
Settings
- instance (string)
- whether to save history (boolean)
- whether to use local mode (boolean)
- default quality (integer or enum)
- colour scheme (integer or enum)
Filters, an array of:
- type (enum)
- data (string)
- label (string)

And here's the data I'd like to add in the future:

Playlists (no schema planned yet)

There is no data import/export process yet (though I'd like to add it soon!) so I'm flexible as far as the JSON schema goes.

Due to the way CloudTube works, I should be able to export metadata about a channel (like its name and icon) for subscription entries if needed.

TobiGr commented 2 years ago

Thank you for starting the work on this topic! A discussion about a standardized export format is long overdue.

I'd like to mention that NewPipe supports multiple services. This has implications for both the subscription and playlist exports.

If we did not include a service identifier when storing subscriptions, NewPipe will have to find the corresponding service for each URL. This is simple for services like YouTube or SoundCloud, because they just use a few domains. However, it requires an API request when performing the check for PeerTube. That in turn will slow down the import process significantly and cause unnecessary traffic if someone imports PeerTube channels from many different instances. I'd therefore suggest to add a service id to each channel.

The service identifier is also needed for the playlist items. It is possible to create a playlist with streams from multiple services in NewPipe. For this reason, storing a video id is not enough. Services with multiple instances like PeerTube even require more info to identify the video correctly. This could either be the whole video URL or more information on the corresponding instance.

cloudrac3r commented 2 years ago

That's a really good point!

Could this be scoped to something like:

youtube: {
  playlists: [
    {
      videos: {
        "dQw4w9WgXcQ",
        ...

If this would work, it means clients that only care about youtube can only look at the part inside youtube .

I don't know if this would work. If it doesn't, we'd have to do it the way you described (which doesn't seem bad either!)

TobiGr commented 2 years ago

Unfortunately, this does not work, because it is possible to create a playlist which has content from different services. We really need to make sure that every stream / video is mapped to a service. Either by using the whole URL or adding a service id and (if needed) info on the PeerTube instance.

TheAssassin commented 2 years ago

Keep in mind that NewPipe for instance supports a variety of services, not just YouTube. The protocol should be capable of listing arbitrary services' entities.

URL is the right keyword here. It stands for "Uniform Resource Locator". There should be one unique and uniform identifier for any content. A youtube.com URL for instance works for YouTube stuff, it's unique, and every app should be able to follow that path. For PeerTube, URLs to some instance may work, too. If not, a special, virtual URL could be introduced, e.g., peertube://. URLs are relatively easy to parse in all relevant languages (correct me should I be wrong, but all the languages I'm working with on a regular basis have some kind of URL parser included); one could easily filter out URLs with the wrong scheme or hostname.

Also, make sure to include versioning, and define some compatibility semantics of the format. For instance, should it be fully backward compatible? Should it be fully forward compatible, too, or just up to the next major release?

Stypox commented 2 years ago

That in turn will slow down the import process significantly and cause unnecessary traffic if someone imports PeerTube channels from many different instances. I'd therefore suggest to add a service id to each channel.

@TobiGr It depends on how we plan to implement this. Just the URL, as TheAssassin said, is enough to uniquely identify a stream, but in NewPipe we store more information than the url in the database. A local playlist item (or any other locally stored stream) has url, service id, thumbnail url, title, stream type, uploader, uploader url, view count, upload date. We obviously can't put all of this info in an import/export file, as all of that info is redundant (i.e. can be obtained again with a request just based on the url). But NewPipe local playlists would not work well without all of that information (what would you show to the user if title, thumbnail, ... are unknown?), so it would need to be fetched with a request anyway sooner or later. And the same request could be used to obtain the service id, whenever needed. So in my opinion there is no need to store service id and/or store data about peertube instances, as all of that information can be fetched when needed by the app (and there is no way to prevent fetching as explained above).

In NewPipe this could be implemented by creating some kind of "ghost database entities", that only have the url available. When such an entity is about to be shown in the UI by a call to a RecyclerView Holder's bind(), instead of displaying the entity information as usual we could display a loading indicator and load the stream info in the background, and when the data is ready show it. There would be some other caveats to solve (e.g. what to do if a ghost stream is enqueued?) that shouldn't be too difficult to solve.

ChunkyProgrammer commented 1 year ago

  "playlists": [
      {
          "name": "__watch_history__",
          "videos": []
      },
      {
          "name": "__watch_later__",
          "videos": []
      },
      {
          "name": "My fav Music!",
          "visibility": "public",
          "description": "The playlist I listen to every day",
          "videos": [
              "djV11Xbc914",
              "HEXWRTEbj1I",
              "sRl02nXVMhw",
              "dQw4w9WgXcQ",
              "Hy8kmNEo1i8",
              "zA52uNzx7Y4",
              "9bZkp7q19f0"
          ]
      },
  ]
}

Might also be worth adding whether the saved playlist is a user created playlist or a service playlist (ex: someone might save a youtube playlist which would save the playlist url instead of the videos in the playlist so that the playlist is dynamic)

absidue commented 1 year ago

From the FreeTube side most of the UI in the export prompt is taken up by the various YouTube compatible formats, so a universal format won't make any major difference to the UI. As for the import, we got rid of the prompt and now decide what to do with the selected file based on the content, so adding the universal format to that, won't change anything from a users perspective.

absidue commented 1 year ago

I am against including project settings inside the universal format, they should be kept seperately. Every project has very different settings, so designing a universal format including the settings, is either going to end up missing a bunch of settings in the export or require updating the format every time one of the projects adds a new setting or changes an existing one. That would just bring us back to the original situation of every project having to handle stuff that is project specific.

Personally I also don't think it's a great idea to include subscriptions, playlists (online and local ones) and watch history in the same file. Because that will result in behavioural difference between applications, some might take the approach of importing everything from the file and others will prompt the user asking which of the data types and services they want to import. Doing a gigantic import all in one go is also more likely to get you ratelimited with the services you are using.

Stypox commented 1 year ago

Shouldn't the discussion continue on https://github.com/UniversalPipeWrench/unified-user-data-format/issues/1 ?

SamantazFox commented 1 year ago

Shouldn't the discussion continue on UniversalPipeWrench/unified-user-data-format#1 ?

Yes, indeed! I'll close and lock this one ^^

iv-org / invidious

[Meta] Standardizing the subscription export format between all YouTube projects #2897