spotipy-dev / spotipy

A light weight Python library for the Spotify Web API
http://spotipy.readthedocs.org
MIT License
5.01k stars 956 forks source link

Support for type hints #439

Open roeekl opened 4 years ago

roeekl commented 4 years ago

It would be nice to add a support for type hints. I know most of the functions have str parameters and returning Dict but it still can help in other functions like prompt_for_user_token.

stephanebruckert commented 4 years ago

Hey @roeekl, looks like this is new in python3.5. If there is a way to add it while keeping compatibility with older python versions, we would definitely merge such PR!

IdmFoundInHim commented 3 years ago

@stephanebruckert What is the minimum version of Python that spotipy 3 will need to support (considering 3.5 is already past EoL and the 3.6 EoL will probably pass before spotipy 3 comes out)?

3.7 introduced from __future__ import annotations, which helps with type hinting and may allow some post-3.7 improvements to be incorporated without breakage

stephanebruckert commented 3 years ago

@IdmFoundInHim I think we should choose what's easy for us. Developers should always think of upgrading their own version of python3. Starting at 3.7 sounds good to me?

akathorn commented 3 years ago

I use pylance strict type-checking in my project and the lack of type hints in Spotipy was irking me, so I started playing around with my own .pyi stubs using stubgen as a starting point. I have annotated things lazily as I needed them, and soon I ran into the problem of representing the type of the JSON return values. After playing around for a bit I realized that since Spotify responses have a fixed schema, they can be represented as TypedDicts with really nice results. I scrapped Spotify's documentation and generated a TypedDict for each return type. Fun stuff.

I have a small example of the type-checking here.

My favorite thing about this approach is that vscode (and certainly other tools) autocompletes the fields of the response: Capture Which means that you don't have to go to the Spotify docs or examine the dictionary at runtime to figure out what fields are inside, and what types they contain. At runtime, the behavior is totally unmodified and they are still dicts. The new code is only used for type-checking.

Point is: if there is interest in this approach, I would be willing to go further with it and contribute it :)

Peter-Schorn commented 3 years ago

@akathorn Open a pull request on the v3 branch with these changes. Type hints are an incredibly useful feature.

akathorn commented 3 years ago

I ran into an issue when representing Spotify JSON objects using TypedDict. I didn't realize at first that some keys will be missing in certain circumstances. For instance, the key "restrictions" of AlbumObject is only present if there are restrictions for the album. This can't be represented with TypedDict yet, where it is assumed that all keys are always present.

Since there is more complexity in representing Spotify JSON objects, I'm going to split this work in two and send a simpler pull request annotating just the function parameters. I'll leave the return types as Dict[str, Any], which is not very expressive but better than nothing.

The current pull request will be for the more complex work of representing Spotify objects in the type system.

Regarding Spotify objects

As I mentioned, the problem right now is that keys in a TypeDict are always assumed to exist. This means that in here:

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials())
album = sp.album("4aawyAB9vmqN3uQ7FjRGTy")
restrictions = album["restrictions"]

The type checker will report that the type of restrictions is AlbumRestriction, which is misleading since during runtime this might end with a KeyError.

I'll think about it, but at this point I see two options:

  1. Warn the user that it might be necessary to double-check for the presence of certain keys. Of course this is tricky.
  2. PEP 655 solves this exact problem, so an option is to just wait until they implement it in typing_extensions.
Peter-Schorn commented 3 years ago

This can't be represented with TypedDict yet, where it is assumed that all keys are always present.

Actually, there is a total parameter that allows you to control whether all keys must be present. See here. There isn't a simple way to mark individual keys as optional, but I think this solution is better than just using Dict[str, Any].

We could also take a completely different approach to the spotify objects: We could decode them into data classes instead of using dictionaries. tekore has already done this. I admit I like this idea more.

IdmFoundInHim commented 3 years ago

Implementing dataclasses is a much bigger task than TypedDict: it would change how the data is accessed at runtime, breaking almost all spotipy-dependent code. It would also add the overhead of creating instances of a class every time a request is made. That could be justified if we leveraged the features of classes, e.g. abstracting paginated objects, but that would involve rewriting most of client.py. In short, it goes far beyond static type checking.

I would rather us not implement data from spotipy in a way that forces other projects to use an abundance of classes. I think we implement as total = false for now, and switch to NotRequired[...] as soon as it doesn't break anything. (It might be doable with the future import already.) Unfortunately, it may be difficult to deduce from the API documentation which values are sometimes absent.

akathorn commented 3 years ago

I thought about total=False, but it has the opposite problem in that all keys are assumed to be possibly missing. This means that to appease the gods of type-checking you must (as far as I know) write code like this:

if "name" in album:
    name = album["name"]

Or alternatively name = album.get("name", "default"). Which means that the typing scheme would encourage you to go against "ask for forgiveness, not for permission".

I personally think this makes using total=False a bit awkward. It is wonderful for autocomplete, but the type-checker is going to be complaining constantly, forcing to either ignore it or write defensive code that will be unnecessary once we can use NotRequired[...].

On the other hand, with Dict[str, Any] you lose a lot (A LOT) of information and safety, but the type-checker won't complain as easily:

restriction = album["restrictions"]  # Type: Any
print("things are restricted! Because: " + restriction["reason"])  # Type-checks

This is basically saying "I don't have a clue about what is inside", while using total=False is misleading since it says "this key might or might not be inside this dict", when in fact it is often safe to assume it is.

My point is that I would prefer a solution that gives very little information to one that provides somewhat-precise information, until NotRequired[...] is implemented. If going for TypeDict, I would prefer to have it with total=True. I think it is a better tradeoff between good type-checking and runtime safety. I leave this decision on your hands though, I'm happy to implement this with TypeDict (either total or not total) right now if you think it is the best path.

IdmFoundInHim commented 3 years ago

How about we use the workaround mentioned in the PEP's Motivation section? That is, use two TypedDict, one with the required keys and another with the not-required keys, liked through inheritance. When the ideal solution is supported, it should be relatively easy to programatically upgrade to the new syntax.

akathorn commented 3 years ago

*facepalm* 🤦🏻‍♂️ I totally missed this. This will certainly work, I'm gonna give it a go :)

Peter-Schorn commented 3 years ago

By the way, I've created a Spotify web API library in Swift which has a struct for each object in the Spotify object model. Swift requires you to explicitly mark which properties might be None, so you can use these types as a guide for which keys can be missing.

If you're not familiar with swift, here's a quick guide on how to read the syntax:

public struct Artist: Hashable {

    let name: String

    // The '?' indicates this could be `None`
    let uri: String?

    let id: String?

    // Either an array of `SpotifyImage`, or `None`
    let images: [SpotifyImage]?

    let popularity: Int?

    // Either a dictionary of with keys of type `String` and values
    // of type `URL`, or `None`.
    let externalURLs: [String: URL]?

    let followers: Followers?

    let genres: [String]?

    let href: URL?

    let type: IDCategory

}
akathorn commented 3 years ago

Thank you, this is going to be useful.

Am I understanding it correctly, that the JSON decoder uses nil to represent both missing keys and null values? Since those need to be represented differently in Python, I'm still at a loss in some cases. Sometimes it is clear in the official docs, but not always. For instance, the releaseDate of an Album appears as optional in the Swift library, but the official docs don't say anything about it being missing/null.

Any suggestions on how to figure out what could be missing and what could be null? The only thing I can think of is checking out other libraries out there, but I imagine it could be quite common that they abstract missing keys and null values in a similar way. Tekore seems to be doing it too, as far as I can see.

Peter-Schorn commented 3 years ago

Am I understanding it correctly, that the JSON decoder uses nil to represent both missing keys and null values?

Yes, that is correct. However, I will look into ways of distinguishing between these cases.

For instance, the releaseDate of an Album appears as optional in the Swift library, but the official docs don't say anything about it being missing/null.

Yes, it's extremely frustrating. Apparently, the developers were too lazy to document each property that might be null. I had to write extensive unit tests in order to manually determine which properties might be null or missing.

akathorn commented 3 years ago

Can I take a look at those tests? I wrote some code to automatize comparing a response with the expected TypedDict type. It detects the difference between null and missing, but of course the problem is being exhaustive. Perhaps I can programmatically adapt some of your test cases to use them with my code.

Peter-Schorn commented 3 years ago

The tests are here, in the same repository. However, they are unlikely to be useful to you as they are also written in Swift. Essentially, they involve making calls to each of the API endpoints and ensuring the response is successfully decoded into the expected type. They aren't going to help you distinguish between null values and missing keys because once the JSON data is decoded into a Swift struct, that distinction is lost.

akathorn commented 3 years ago

I'm hoping for good examples of calls that return missing keys or null values. Right now I have tests such as:

def test_track_urn(self):
    track = self.spotify.track(self.creep_urn)
    self.assertType(track, self.spotify.track)

assertType checks the response against the method's signature, and reports any differences:

<Track>
  {
    required [dict]:
      * Missing required keys: {'linked_from'}
      {
        preview_url [primitive]:
          * None: expected <class 'str'> but got <class 'NoneType'>
      }
  }

Which shows that the key linked_from can be missing in the response, and preview_url can be null. These tests are quick to write since you only need to know the arguments for the call, the rest is automated.

My idea is finding a lot of examples, trying to be as exhaustive as possible, and correct the types until they fit every known case. There would certainly be a lot of false negatives because of edge cases and outliers for which there are no tests. But if we can't find an alternative, maybe fitting most cases would be good enough? For the remaining keys a safe fallback could be to to set them both as not required and optional, which would mean having to do two checks but would ensure runtime safety.

Peter-Schorn commented 3 years ago

Writing extensive unit tests is the only reliable way to validate the type hints. That's what I did for my library. Here's one tip: Make sure you run tests with local tracks. These tracks contain lots of missing data. They may not even have URIs! You have to add them to a playlist and then make a request for that playlist. Here's one of my playlists with local tracks:

https://open.spotify.com/playlist/0ijeB2eFmJL1euREk6Wu6C

leleogere commented 2 years ago

Having typed objects for each possible output Spotify API is able to return would greatly improve development workflow in IDEs.

akathorn commented 1 year ago

I didn't wrap this up in the end, in great part due to a chronic illness. But anyway this inconsistency between null values and missing keys was really annoying me. I don't want to give wrong annotations because they give a false sense of safety, and testing all edge-cases seemed rather daunting to me. I could try, but I don't know if I'll find the time and energy to finish this up anytime soon. I wrote a "type-checker" to automatically test annotations against the Spotify endpoints (using reflection), maybe if I can clean that up someone could eventually continue testing endpoints and tweaking the annotations?