joalla / discogs_client

Continuation of the "Official Python Client for the Discogs API"
https://python3-discogs-client.readthedocs.io
Other
299 stars 49 forks source link

Pagination above 100 disabled for inventories besides your own #47

Closed feacluster closed 3 years ago

feacluster commented 3 years ago

I get this error when trying to print out more than 10,000 listings from a seller's inventory.

Here is the code snippet to reproduce the issue ( need to let it run for around 10 minutes ):

d = discogs_client.Client('Crate_Digger_Application_by_feacluster/0.1', user_token="xxx")

seller = d.user( 'philadelphiamusic' )

inventory = seller.inventory
inventory._per_page=100

count = 0
total = len(inventory)

print("<p>", total, " items found in seller's inventory <p>")

for album in inventory:
   artist = album.data['release']['artist']
   print ( artist + ' -- ' + str ( count ) + ' of ' + str ( total ) )
   count +=1

Seems this issue has been reported here before:

https://www.discogs.com/forum/thread/778418

But wondering if there might be some workaround to get what I want via filtering? I am really only after VG+ or higher records. And only intersted in vinyl, not other formats. So that alone can drastically reduce a seller's inventory..

AnssiAhola commented 3 years ago

Hi @feacluster

If you have the time and patience, I managed to get 73,021 / 98,590 listings from 'philadelphiamusic' In ONLY 2 hours 8 minutes! 😅

seller = d.user('philadelphiamusic')

options = [
    ('listed', 'asc'),
    ('listed', 'desc'),
    ('price', 'asc'),
    ('price', 'desc'),
    ('item', 'asc'),
    ('item', 'desc'),
    ('artist', 'asc'),
    ('artist', 'desc'),
    ('label', 'asc'),
    ('label', 'desc'),
    ('catno', 'asc'),
    ('catno', 'desc'),
    ('audio', 'asc'),
    ('audio', 'desc'),
]

inventory = seller.inventory
inventory.per_page = 100
total = len(inventory)
pages = inventory.pages

# Unique listings
found = []

for option in options:
    # All listings found, exit
    if len(found) == total:
        exit()

    # Change the sorting
    print("Trying option", option)
    inventory.sort(*option)

    # Get First 100 pages (10,000 listings)
    for page_num in range(1, min(100, pages) + 1):
        for listing in inventory.page(page_num):
            # Find unique listings from page
            if listing.id not in found:
                found.append(listing.id)
        print(f'{page_num}: {len(found)} / {total} found.')

It seems that you cant get past 10,000 listings even in the website! https://www.discogs.com/seller/philadelphiamusic/profile?limit=100&page=101 <- Page 101 gives 404 haha

And you can't filter with the API, even tho you can with the website and there's even a filter function in the codebase https://github.com/joalla/discogs_client/blob/c31cf003b903ccf1446a67f7e60efe8bb58f8bc2/discogs_client/models.py#L320-L323

I tried to use the filtering with inventory.filter(format='Vinyl'), didn't do anything tho 😞

@alifhughes Any ideas?

feacluster commented 3 years ago

Many thanks for your ingenious workaround to try different sorting to get most of the inventory! I don't have a problem with the 2+ hour runtime.

Hopefully we can get the filtering to work as that combined with your sorting workaround should get nearly all of the inventory..

This is for my "crate digger app" which tells you what artists a seller has that you already own.

http://35.224.8.67/crate_digger.php

alifhughes commented 3 years ago

Hey all, sorry for the delay in reply.

I get this error when trying to print out more than 10,000 listings from a seller's inventory.

@feacluster Unfortunately there is a hard limit set by discogs.com of retrieving 10,000 of anything, be it releases of a label, artist, inventory items or whatever, please see this thread for more information: https://www.discogs.com/forum/thread/747186#7438700. It would not matter if you could change the per_page to 10,000 results, you'd only get a single page even if there were more releases/items on discogs.com to retrieve. So unfortunately, @AnssiAhola is correct, as highlighted in responses in the thread linked above, people have resorted to filtering/sorting to get around this.

I tried to use the filtering with inventory.filter(format='Vinyl'), didn't do anything tho 😞 @alifhughes Any ideas?

@AnssiAhola You're correct there is a filter function because on some endpoints you can apply a filter, such as ArtistReleases for example. However on the Inventory endpoint, we can only apply a sort, so it will not have an effect.

Sorry that I can't really fix/help much further in this regard as these are things controlled by Discogs.com themselves. @feacluster I would also apply @AnssiAhola great solution of sorting, to get around the hard limit 😄 I hope I've helped to clear things up! I want to put similar information in the README as it confused me at first also. Please reopen issue if I can assist any more, thank you 🙂