Open sigma67 opened 2 years ago
Pydantic v2 ETA is now Q1 2023, so we can start work on this Q2 at the earliest
I am willing to help :)
Pydantic v2 is now in alpha: https://docs.pydantic.dev/blog/pydantic-v2-alpha/
I think we should use this issue as a starting point to create additional subtasks using GitHub tasklists: https://docs.github.com/en/issues/tracking-your-work-with-issues/about-tasklists
As this is a major endeavor with breaking changes development should take place on a v2 branch for now.
Started playing around with this, here is a rudimentary way to provide type validation.
I created a new module called where I started to define some types. The Playlist
type, for example, looks like this:
class Playlist(BaseModel):
title: str
playlistId: str
thumbnails: list[Thumbnail]
description: str
count: Optional[str] = None
author: Optional[list[Author]] = None
Then, inside "parsers/browsing.py", we can create a new instance of this Playlist
class with the information we parse from parse_playlist
:
def parse_playlist(data):
playlist = {
'title': nav(data, TITLE_TEXT),
'playlistId': nav(data, TITLE + NAVIGATION_BROWSE_ID)[2:],
'thumbnails': nav(data, THUMBNAIL_RENDERER)
}
subtitle = data['subtitle']
if 'runs' in subtitle:
playlist['description'] = "".join([run['text'] for run in subtitle['runs']])
if len(subtitle['runs']) == 3 and re.search(r'\d+ ', nav(data, SUBTITLE2)):
playlist['count'] = nav(data, SUBTITLE2).split(' ')[0]
playlist['author'] = parse_song_artists_runs(subtitle['runs'][:1])
return Playlist(**playlist) # Return playlist instance
(Note, this could probably be done at a "lower level" without having to create the intermediary playlist
dictionary first)
Then, we can add some type hinting in "mixins/library.py"...
def get_library_playlists(self, limit: int = 25) -> list[Playlist]:
And now we get some more helpful type hinting with the new object.
The data validation works too. For example, in the Playlist
class, I specify count
as optional, because I believe empty playlists do not have the count
field. If I remove the optional type and make it required, I get a validation error for a playlist that does NOT have the count
field:
Field required [type=missing, input_value={'title': 'Episodes for L...des you save for later'}, input_type=dict]
For further information visit https://errors.pydantic.dev/0.38.0/v/missing
I can continue to help with this work - I think it can be very helpful for maintaining the API as well as improving usability with the type hints. Here's a quick branch comparison between the testing I did and the current master branch:
https://github.com/sigma67/ytmusicapi/compare/master...JohnHKoh:ytmusicapi:v2-pydantic-test?expand=1
Looks like a good start! I like that it's not very invasive to the current code base, at least not in this case.
I think we should probably start by dumping the currently returned responses to make sure we don't miss any fields. Then we can have a look at the code to determine which fields need to be optional, for each function.
Did you use pydantic V2 beta? I think you'd also need to add that to the dependencies.
I also noticed that you used list
as opposed to List
, we would need to increase requires_python>=3.9
for that to happen. Which is fine by me, as 3.8 will only have 1 year of support left by the time we ship this change.
I think we should probably start by dumping the currently returned responses to make sure we don't miss any fields. Then we can have a look at the code to determine which fields need to be optional, for each function.
Definitely, anywhere we can consolidate these? Should we just post them in here? The greater the sample size the better, so we don't miss any edge cases or differences.
Did you use pydantic V2 beta? I think you'd also need to add that to the dependencies.
I was using v2 beta in a venv, but yes, it will need to be added. I was just playing around with integrating it into the code, so nothing I wrote was really meant to be final! Just a start. The way we "convert" the raw data into the new types is definitely an implementation detail that could warrant more discussion. Should it use some kind of constructor? Or some other generic parsing function?
I also noticed that you used
list
as opposed toList
, we would need to increaserequires_python>=3.9
for that to happen. Which is fine by me, as 3.8 will only have 1 year of support left by the time we ship this change.
I also support the push list
, as it seems like the List
from the typing
module may be deprecated some time after October 2025.
Definitely, anywhere we can consolidate these? Should we just post them in here? The greater the sample size the better, so we don't miss any edge cases or differences.
I don't think we need to many different samples, just need to ensure that the edge cases are covered (like unavailable songs, which have some data fields, but not others). Instead of dumping we could also just go ahead and try with all fields optional and default None. That would already be an improvement over the current case where sometimes the keys are missing, making the library a bit cumbersome to use.
Should it use some kind of constructor? Or some other generic parsing function?
Definitely just use the default constructor
I also support the push list
Then let's go ahead with >=3.9
.
Hi, I recently made an API Wrapper that used Pydantic.
https://github.com/TheOnlyWayUp/Wattpad-Py
A tool I used prominently was https://jsontopydantic.com.
I'm going to draft up a PR this week.
Something I've been meaning to implement in the long term is proper models and type validation for the responses from YouTube Music. This would provide better data validation and would notify sooner if something on the API side has changed, as opposed to the current tests which usually check for the length of the results in the response only.
pydantic seems to be the gold standard for this. I'd like to wait for the release of v2.0 to not have to deal with breaking changes from v1 once it comes out.
Please thumbs up if you're in favor and leave a comment if you're willing to help or contribute 🥳