joalla / discogs_client

Continuation of the "Official Python Client for the Discogs API"
https://python3-discogs-client.readthedocs.io
Other
299 stars 49 forks source link

Local caching? #124

Open dsm-72 opened 1 year ago

dsm-72 commented 1 year ago

It seems that regardless of caching each accessing each object's data attribute is a function call as

rdata = release.data

raises an HTTPError, which most likely happened since I was playing around with your objects trying to figure out how to best export them (since there isn't a to_json method) and I hit my quota in the process...

Thoughts?

AnssiAhola commented 1 year ago

Hey @dsm-72

Can you provide us with some code example(s) how you encounter this issue? Accessing the data attribute shouldn't cause any extra API requests as far as I know.

Also one "fix" could be to enable exponential backoff so you don't get the "429 Too Many Requests" error, but I know it might not be the best solution in this case. EDIT: Forgot that exponential backoff should be enabled by default. Which version of the client are you running?

dsm-72 commented 1 year ago

@AnssiAhola Hmm, strange. Well I will continue to explore this library. I am currently using version 2.6

dsm-72 commented 1 year ago

@AnssiAhola

so below I am trying to get all the results related to a release as something that is JSON exportable.

import discogs_client
from discogs_client.client import HTTPError as DiscogsHTTPError
from urllib.error import HTTPError

RELEASE_ATTRS_TO_IGNORE = (
    'marketplace_stats price_suggestions '
    'community num_for_sale '
    'lowest_price '    
).split()

from urllib.error import HTTPError
import discogs_client
from discogs_client.client import HTTPError as DiscogsHTTPError

def release_to_dict(release):    
    rdata = release.data

    r_keys = list_release_attrs(release)
    r_keys = list(filter(
        lambda e: e not in RELEASE_ATTRS_TO_IGNORE,
        list_release_attrs(release)
    ))

    rdata['is_master'] = is_release_master(release)
    for k in r_keys:
        try:
            attr_val = getattr(release, k, None)

            # Called data to make rdata
            if k == 'data':
                continue

            elif k == 'main_release':
                rdata[k] = release_to_dict(attr_val)

            elif k == 'versions':
                versions = get_release_versions(release)
                rdata[k] = [release_to_dict(version) for version in versions]
                pass

            elif k == 'master':
                rdata[k] = attr_val

            elif k not in rdata and attr_val:             
                rdata[k] = attr_val

            else:
                pass  

        except DiscogsHTTPError as err:
            print(f'DiscogsHTTPError on k={k}')
            print(err)

        except HTTPError as err:
            print(f'HTTPError on k={k}')
            print(err)

    return rdata
# Now make the call
release_to_dict(release)
DiscogsHTTPError on k=main_release
404: The requested resource was not found.

and yet we see


{
  # ...
  'main_release': <Release ID 'Name'>,
  # ...
  'versions': [
    {'id': <idnum>, ... 'master': <Master ...>}, ...
    {'id': <idnum>, ... 'master': <Master ...>}, ...
    ]
}

Which is odd because it is making the calls from:

def get_release_versions(release):
    versions = []
    ver_attr = getattr(release, 'versions', None)
    if ver_attr is None:
        return []
    num_versions = ver_attr.count
    for page_number in range(ver_attr.pages):
        page = ver_attr.page(page_number+1)
        versions.extend(page)
    return versions

and converting them to an dictionary.

master is not called with release_to_dict because it is (1) it will trigger infinite recursion if I dont add more logic to catch master --> version --> master --> loop.

Yet main_release from release.main_release throws a 404 error!

AnssiAhola commented 1 year ago

Hmm strange... Can you copy/paste the stacktrace from that 404 error?

from traceback import print_exception
print_exception(err)

You can also enable logging to see which API points are called

import logging
logging.basicConfig(level=logging.DEBUG)

What does the list_release_attrs and is_release_master functions do exactly? Can you paste the code in here?

Also does this happen all the time? Or only sometimes? Is this parsing releases for one Artist or multiple? Which ones? Questions, questions... :sweat_smile:

dsm-72 commented 1 year ago

@AnssiAhola sure, just some basic utilities:

def list_public_attrs(obj):
    all_attrs = dir(obj)
    attrs = list(filter(
        lambda e: '__' not in e and e[0] != '_', 
        all_attrs
    ))
    return attrs

DISCOG_CLIENT_ATTRS = (
    'changes client delete fetch join previous_request '
    'refresh save'
).split()

def list_discog_object_attrs(obj):
    _bad_attrs = DISCOG_CLIENT_ATTRS
    attrs = list_public_attrs(obj)    
    attrs = list(filter(
        lambda e: e not in _bad_attrs,
        attrs
    ))
    return attrs

def list_release_attrs(release):
    attrs = list_discog_object_attrs(release)    
    return attrs

Basically list_release_attrs gets all attributes (attr) from dir(release) that

  1. is not a python dunder __something__ or private var _my_var and,
  2. are not your class methods (fetch, save, delete, etc)
def get_release_kind_from_str(release):
    return str(release).lower().replace('<', '').replace('>', '').split()[0]

def is_release_master(release):
    str_has_master = 'master' in str(release).lower()
    str_is_master = 'master' == get_release_kind_from_str(release)
    return str_has_master and str_is_master

    # NOTE: these are other ways to tell if a release is a master release
    master_in_request = release.previous_request and 'masters' in release.previous_request
    id_in_request = release.previous_request and str(release.id) in release.previous_request

    has_master = getattr(release, 'master', None)
    has_versions = getattr(release, 'versions', None)
    has_main_release = getattr(release, 'main_release', None)

    is_master = (
        (master_in_request and id_in_request) or
        (not has_master and has_main_release and has_versions) or 
        str_has_master
    )

    return is_master

Basically is_release_master is a janky way to test if print(release) is in the format "<Master <id> <name> >" format. Probably should use isinstance (or preferably, api has util functions to check these things)

This happens sometimes, normally after calling an object's main_release a few times while iteratively testing a function. It is parsing a single release from a single artist (although I do have their alias, which discogs treats as separate i.e. searching for an alias's releases do not match the original artist)