Type validation with pydantic

sigma67 commented 2 years ago

Something I've been meaning to implement in the long term is proper models and type validation for the responses from YouTube Music. This would provide better data validation and would notify sooner if something on the API side has changed, as opposed to the current tests which usually check for the length of the results in the response only.

pydantic seems to be the gold standard for this. I'd like to wait for the release of v2.0 to not have to deal with breaking changes from v1 once it comes out.

Please thumbs up if you're in favor and leave a comment if you're willing to help or contribute 🥳

sigma67 commented 1 year ago

Pydantic v2 ETA is now Q1 2023, so we can start work on this Q2 at the earliest

ambujpawar commented 1 year ago

I am willing to help :)

sigma67 commented 1 year ago

Pydantic v2 is now in alpha: https://docs.pydantic.dev/blog/pydantic-v2-alpha/

I think we should use this issue as a starting point to create additional subtasks using GitHub tasklists: https://docs.github.com/en/issues/tracking-your-work-with-issues/about-tasklists

As this is a major endeavor with breaking changes development should take place on a v2 branch for now.

JohnHKoh commented 1 year ago

Started playing around with this, here is a rudimentary way to provide type validation.

I created a new module called where I started to define some types. The Playlist type, for example, looks like this:

class Playlist(BaseModel):
    title: str
    playlistId: str
    thumbnails: list[Thumbnail]
    description: str
    count: Optional[str] = None
    author: Optional[list[Author]] = None

Then, inside "parsers/browsing.py", we can create a new instance of this Playlist class with the information we parse from parse_playlist:

def parse_playlist(data):
    playlist = {
        'title': nav(data, TITLE_TEXT),
        'playlistId': nav(data, TITLE + NAVIGATION_BROWSE_ID)[2:],
        'thumbnails': nav(data, THUMBNAIL_RENDERER)
    }
    subtitle = data['subtitle']
    if 'runs' in subtitle:
        playlist['description'] = "".join([run['text'] for run in subtitle['runs']])
        if len(subtitle['runs']) == 3 and re.search(r'\d+ ', nav(data, SUBTITLE2)):
            playlist['count'] = nav(data, SUBTITLE2).split(' ')[0]
            playlist['author'] = parse_song_artists_runs(subtitle['runs'][:1])

    return Playlist(**playlist) # Return playlist instance

(Note, this could probably be done at a "lower level" without having to create the intermediary playlist dictionary first)

Then, we can add some type hinting in "mixins/library.py"...

    def get_library_playlists(self, limit: int = 25) -> list[Playlist]:

And now we get some more helpful type hinting with the new object.

The data validation works too. For example, in the Playlist class, I specify count as optional, because I believe empty playlists do not have the count field. If I remove the optional type and make it required, I get a validation error for a playlist that does NOT have the count field:

  Field required [type=missing, input_value={'title': 'Episodes for L...des you save for later'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/0.38.0/v/missing

I can continue to help with this work - I think it can be very helpful for maintaining the API as well as improving usability with the type hints. Here's a quick branch comparison between the testing I did and the current master branch:

https://github.com/sigma67/ytmusicapi/compare/master...JohnHKoh:ytmusicapi:v2-pydantic-test?expand=1

sigma67 commented 1 year ago

Looks like a good start! I like that it's not very invasive to the current code base, at least not in this case.

I think we should probably start by dumping the currently returned responses to make sure we don't miss any fields. Then we can have a look at the code to determine which fields need to be optional, for each function.

Did you use pydantic V2 beta? I think you'd also need to add that to the dependencies.

I also noticed that you used list as opposed to List, we would need to increase requires_python>=3.9 for that to happen. Which is fine by me, as 3.8 will only have 1 year of support left by the time we ship this change.

JohnHKoh commented 1 year ago

I think we should probably start by dumping the currently returned responses to make sure we don't miss any fields. Then we can have a look at the code to determine which fields need to be optional, for each function.

Definitely, anywhere we can consolidate these? Should we just post them in here? The greater the sample size the better, so we don't miss any edge cases or differences.

Did you use pydantic V2 beta? I think you'd also need to add that to the dependencies.

I was using v2 beta in a venv, but yes, it will need to be added. I was just playing around with integrating it into the code, so nothing I wrote was really meant to be final! Just a start. The way we "convert" the raw data into the new types is definitely an implementation detail that could warrant more discussion. Should it use some kind of constructor? Or some other generic parsing function?

I also noticed that you used list as opposed to List, we would need to increase requires_python>=3.9 for that to happen. Which is fine by me, as 3.8 will only have 1 year of support left by the time we ship this change.

I also support the push list, as it seems like the List from the typing module may be deprecated some time after October 2025.

sigma67 commented 1 year ago

Definitely, anywhere we can consolidate these? Should we just post them in here? The greater the sample size the better, so we don't miss any edge cases or differences.

I don't think we need to many different samples, just need to ensure that the edge cases are covered (like unavailable songs, which have some data fields, but not others). Instead of dumping we could also just go ahead and try with all fields optional and default None. That would already be an improvement over the current case where sometimes the keys are missing, making the library a bit cumbersome to use.

Should it use some kind of constructor? Or some other generic parsing function?

Definitely just use the default constructor

I also support the push list

Then let's go ahead with >=3.9.

TheOnlyWayUp commented 9 months ago

Hi, I recently made an API Wrapper that used Pydantic.

https://github.com/TheOnlyWayUp/Wattpad-Py

A tool I used prominently was https://jsontopydantic.com.

I'm going to draft up a PR this week.

sigma67 / ytmusicapi

Type validation with pydantic #307