Dumping improvement ideas for spotipy version 3

stephanebruckert commented 3 years ago

Let's start thinking of the next major version for spotipy, while we keep maintaining the version 2.

spotipy's philosophy for v3 remains the same:

A light weight Python library for the Spotify Web API

Light weight means we keep it simple and let developers parse the JSON results themselves, as we trust that the Spotify API is backward-compatible.

v3 PRs and issues can use the label:

To do in version 3:

remove python2 support
support for type hints https://github.com/plamere/spotipy/issues/439
clean up deprecated methods
get rid of oauth2.py which confusingly includes non-oauth2 flows -> we could have an "auth" folder and a file for each authentication flow
things tagged with spotipy3
use ValueError exceptions here https://github.com/spotipy-dev/spotipy/commit/b1db0b63d90eae40af32d7ee4c760c2fd581a1b4#diff-952bb1443c4db0c5905601651c4ed9de43df4d4c5ad17939cffaf8211a49914eR1971

Feel free to suggest anything that you believe could improve spotipy in the long term. Independent issues / PRs can still be created and referenced in this discussion.

Development process

Here is a first suggestion, and can be improved:

a new v3 branch exists, so v3 PRs are welcome now!
all backward-incompatible PRs can be merged into v3
backward-compatible PRs can be merged into master as usual
PRs welcome to regularly rebase master onto v3 while making sure it doesn't include any of the things we don't want in v3
- we will be able to create pre-releases along the way https://www.python.org/dev/peps/pep-0440/#id27

There is no rush, this will take the time it needs. Any suggestion is welcome! Thank you

Peter-Schorn commented 3 years ago

I've noticed that the initializer for Spotify has three parameters that all alias each other: client_credentials_manager, oauth_manager, and auth_manager. We should remove the first two because they just create confusion.

Peter-Schorn commented 3 years ago

DONE √ in https://github.com/plamere/spotipy/pull/665

Here's another idea: How about creating an Enum for all of the authorization scopes:

from enum import Enum
from typing import Iterable
import re

class Scope(Enum):
    ugc_image_upload = "ugc-image-upload"
    user_read_recently_played = "user-read-recently-played"
    user_read_playback_state = "user-read-playback-state"
    playlist_modify_public = "playlist-modify-public"
    # and so on...

    @staticmethod
    def make_string(scopes: Iterable['Scope']) -> str:
        """
        Converts a sequence of scopes to a string.

        * scopes: An iterable of scopes.

        returns: a space-separated string of scopes
        """
        return " ".join([scope.value for scope in scopes])

    @staticmethod
    def from_string(scope_string: str) -> set['Scope']:
        """
        Converts a string of (usuallly space-separated) scopes into a
        set of scopes

        * scope_string: a string of scopes

        returns: a set of scopes.
        """
        scope_string_list = re.split(pattern=r"[\W-]+", string=scope_string)
        scopes = set()
        for scope_string in scope_string_list:
            try:
                scope = Scope(scope_string)
                scopes.add(scope)
            except ValueError as error:
                pass
        return scopes

scopes: set[Scope] = {
    Scope.ugc_image_upload,
    Scope.user_read_playback_state,
    Scope.playlist_modify_public
}

scope_string = Scope.make_string(scopes)
print(scope_string)

converted_scopes = Scope.from_string(scope_string)
print(converted_scopes)

This can be a purely additive change to v3. We can still accept the scopes as a string string or a list of strings.

I see several advantages to an enum:

It prevents typos
It enables autocomplete by your IDE, which not only makes the scopes faster to type, but enables better discoverability of the available scopes

stephanebruckert commented 3 years ago

https://github.com/plamere/spotipy/blob/master/spotipy/cache_handler.py

As we add more and more cache handlers, we would need to split the cache handler file into single files and move them in a folder.

Suggestions:

cache_handlers/
- base_cachehandler.py (added base)
- cache_file_handler.py
- memory_cache_handler.py

or:

cache_handlers/
- cache_handler.py
- connectors/
  - cache_file_handler.py
  - memory_cache_handler.py

or:

cache_handler.py
cache_handlers/
- cache_file_handler.py
- memory_cache_handler.py

Peter-Schorn commented 3 years ago

I like the first layout the most; it's the simplest. I suggest we use the same structure for the auth managers as well.

mmattbtw commented 3 years ago

Maybe add event handlers for when someone skips/plays the next song.

Peter-Schorn commented 3 years ago

That's not possible with the web API. All of the endpoints are documented here and spotipy already has a method for every endpoint and every authorization method.

nleroy917 commented 3 years ago

Headless authentication has always seemed to be a hot-ticket item

IdmFoundInHim commented 3 years ago

I think that some sort of handling/abstraction for Spotify's paging objects should be included because many of the basic endpoint return these paging objects. My guess is that most projects depending on spotipy have to deal with these, which requires making manual HTTP requests. This lies on the border of "let developers parse JSON results themselves" and the Spotify API itself; The object has to be parsed to deduce if another request is needed, but to make that request you need the auth token. If spotipy is supposed to work out of the box, this feature is necessary for many basic endpoints. A dependent package otherwise must import requests and do the extra work of integrating spotipy functions with those requests.

In my experience, the best way to deal with these is to convert them to a more Pythonic iterable — namely, a generator. Because it uses lazy evaluation, it keeps the memory advantages of a paging object yet can be easily converted to a list if resources are available in excess.

The most aggressive solution would be to modify the return values of endpoints that include paging objects (e.g. Spotify.playlist_items), replacing each JSON object that has a 'next' attribute with a generator. This would create a full abstraction layer, making it easy for someone unfamiliar with the API to use spotipy. However, it eliminates the convenient 'total' attribute and results in a dict that cannot be trivially converted back to portable JSON. Alternatively, we could include this functionality as an option in relevant endpoints (as_generator=True or something similar). That could be done in spotipy 2 (default is paging object) or 3 (default is generator).

Another solution, which may be fully backwards compatible, is adding a function to the Spotify class or the auth manager classes. This function would take a paging object and return a generator. I have a specialized implementation of this in my own project, and could enhance it for inclusion if this is the desired solution.

IdmFoundInHim commented 3 years ago

For the endpoints that limit input to 50 or 100 items, spotipy should accept any amount of input, breaking it up into smaller requests internally. Currently, devs using spotipy have to implement this themselves if their programs have any chance of exceeding the limit.

nleroy917 commented 3 years ago

@IdmFoundInHim I agree that the paging aspect of the API can be a bit of a nuisance... but I feel like the .next() method of spotipy already makes this so easy. This is some code I use regularly to iterate over paged objects:

all_playlists = []
playlists = spotipy.current_user_playlists(limit=50)
all_playlists += playlists['items']

while playlists['next']:
    playlists = spotipy.next(playlists)
    all_playlists += playlists['items']

# do something with playlists

Maybe on methods/functions that can potentially lead to pagination, add a page_items=True flag and it conducts the above code for you?

I also agree an internal "grouping algorithm," would be great since some endpoints only accept a certain number of ids. I always have to write this myself in order to achieve seamless use.

Peter-Schorn commented 3 years ago

Perhaps we should have an extend method which, similar to next, accepts a page of results and then requests all additional pages.

nleroy917 commented 3 years ago

Wouldn't that pretty much render next obsolete? I feel like most would just opt for extend. Maybe you could add a flag to the next method:

all_playlists = []
page = spotipy.current_user_playlists(limit=50)
all_playlists += page['items']
if page['next']:
    playlists += spotipy.next(page['next'], all_pages=True)['items']

I think here you prevent the user from accessing other data in that page object. I guess, however, that is their choice if they elect to use an extend method or an extend=True flag.

... if I understand correctly here.

Peter-Schorn commented 3 years ago

Some people need fine-grained control over when and which pages to retrieve (think of an infinitely-scrolling list), so next should not be rendered obsolete if we add the extend method. Further, the word next implies that only a single additional page will be retrieved, so adding an all_pages parameter to it seems weird. Instead, it should be a separate method:

def extend(self, page):
    results = page["items"]
    while page["next"]:
        page = self.next(page)
        results.extend(page["items"])
    return results

IdmFoundInHim commented 3 years ago

I actually wasn't aware of Spotify.next, but shouldn't we be making the objects compatible with the Python builtins instead of duplicating them? It would be intuitive for next(Spotify.playlist_items(...)) to return the next playlist item, rather than a paging object.

Converting to an iterator would take care of both use cases (controlled or mass retrieval), and would avoid loading unnecessarily large objects into memory.

def page_iter(self, page):
    yield from page['items']
    while page['next']:
        page = self.next(page)
        yield from page['items']

This would allow you to do:

playlists = spotify.page_iter(spotify.current_user_playlist(limit=50))
for playlist in playlists:
    # For example:  
    spotify.playlist_add_items(playlist['id'], [your_favorite_song])

...as well as...

liked_songs = spotify.page_iter(spotify.current_user_saved_songs())

def on_scroll_end():
    render_songs(itertools.islice(liked_songs, 20))

timrae commented 1 year ago

Use aiohttp and enable Async/await in the spotipy interface

hansmbakker commented 1 year ago

There is a request for allowing the Device Authorization Grant flow on the Spotify forum - it is very useful for embedded usecases like a remote control.

Ousret commented 11 months ago

Hello there,

You may be interested in https://github.com/jawah/niquests for a transparent upgrade of Requests. Have async, sync, and multiplexed connection. Support HTTP/2, and 3.

wallysaurus commented 7 months ago

That's not possible with the web API. All of the endpoints are documented here and spotipy already has a method for every endpoint and every authorization method.

has this been updated since? I'm seeing this page on the official website now, although it's in javascript. https://developer.spotify.com/documentation/web-playback-sdk/reference#events

spotipy-dev / spotipy

Dumping improvement ideas for spotipy version 3 #652

To do in version 3:

Development process