stashapp / CommunityScrapers

This is a public repository containing scrapers created by the Stash Community.
https://stashapp.github.io/CommunityScrapers/
GNU Affero General Public License v3.0
649 stars 417 forks source link

TheScoreGroup: Cover Images #1831

Open Ronnie711 opened 4 months ago

Ronnie711 commented 4 months ago

Scraper name: TheScoreGroup

Currently the scraper is grabbing cover image from the scene page, but this essentially just a carousel of screenshots. The correct scene cover can be seen on the home/category/model pages but not on the scene page.

However, thanks to a bit of poking it would seem that the URLs are consistent

Example links

Scene: https://www.scoreland.com/big-boob-videos/Danniella-Levy/50022/ Performer: https://www.scoreland.com/big-boob-models/Danniella-Levy/7951/ Cover Image: (In best quality) https://cdn77.scoreuniverse.com/modeldir/data/posting/50/022/posting_50022_1920.jpg

Note that it's not including the performer id, just splitting the studio code across directories and then including in file name

2nd Example

Scene: https://www.18eighteen.com/xxx-teen-videos/Emma-Bugg/71841/ Performer: https://www.18eighteen.com/teen-babes/Emma-Bugg/9417/ Cover Image: https://cdn77.scoreuniverse.com/modeldir/data/posting/71/841/posting_71841_1920.jpg

(Shoutout to randomuser2022 for pointing me in the right direction)

Ronnie711 commented 4 months ago

Been checking back on older scenes & for obvious reasons 1080 images aren't available for everything.

The sliding scale for sizes:

1920x: _1920.jpg 1600x: _1600.jpg 1280x: _1280.jpg 800x: _800.jpg 600x: _xl.jpg 450x: _lg.jpg 225x: _med.jpg 100x: .jpg (No size info = a tiny image!)

Whilst width is consistent, height is variable depending on source as we're dealing with a highly consistent organisation!

Maista6969 commented 4 months ago

This is great research! Am I understanding you right in that the largest size that will be available for all scenes is 800x?

Ronnie711 commented 4 months ago

This is great research! Am I understanding you right in that the largest size that will be available for all scenes is 800x?

Just checked on the oldest scene on 18eighteen (https://www.18eighteen.com/xxx-teen-videos/Julissa-Delor/11628/) & image is available up to 800. However XL Girls oldest scene (https://www.xlgirls.com/bbw-videos/China/6889/) is only available up to _xl!

Also to note as this is only a 4 digit studio code the directory split is 1 & 3 characters, not 2 & 3 as previously seen (https://cdn77.scoreuniverse.com/modeldir/data/posting/6/889/posting_6889_xl.jpg)

Ronnie711 commented 4 months ago

Got bored, deep searched 18eighteen.com ...

Currently scraper is using "Poster" for the selector & after searching through 39 pages of scenes, this covers everything back to January 2009. Pre 2009 uses <img src="[https://cdn77.scoreuniverse.com/modeldir/data/posting/12/003/posting_12003_x_med.jpg](view-source:https://cdn77.scoreuniverse.com/modeldir/data/posting/12/003/posting_12003_x_med.jpg)" srcset="https://cdn77.scoreuniverse.com/modeldir/data/posting/12/003/posting_12003_x_med.jpg 169w" (Scene URL: https://www.18eighteen.com/xxx-teen-videos/Alyssa-Star/12003/)

For this a 800x is available: https://cdn77.scoreuniverse.com/modeldir/data/posting/12/003/posting_12003_x_800.jpg so that's annoying.

However, anything from 2009 using the "Poster" selector is showing the largest size image in the selector, we'd only need to modify it for 1280 images to grab the 1920's instead ... Suddenly it's become a lot easier!

feederbox826 commented 1 month ago

I implemented this in python only to realize that sceneScraper is in xPath 😞

Here's the code, it's pretty good *so far

import requests
client = requests.Session()

def test_url(url, quality):
    return client.head(url+quality+".jpg").status_code == 200

def get_best_image(id):
    if len(id) == 4:
        idpath = f"{id[0]}/{id[1:]}"
    elif len(id) == 5:
        idpath = f"{id[0:2]}/{id[2:]}"
    noQualPath = f"https://cdn77.scoreuniverse.com/modeldir/data/posting/{idpath}/posting_{id}"
    # https://github.com/stashapp/CommunityScrapers/issues/1831#issuecomment-2106027395
    for quality in ["_1920", "_1600", "_1280", "_800", "_xl", "_lg", "_med", ""]:
        if test_url(noQualPath, quality):
            print(f"✅ Found {quality} for {id}")
            return noQualPath+quality+".jpg"

print(get_best_image("50022"))
print(get_best_image("11628"))
print(get_best_image("6889"))

output

❯ python .\testimg.py
✅ Found _1920 for 50022
https://cdn77.scoreuniverse.com/modeldir/data/posting/50/022/posting_50022_1920.jpg
✅ Found _800 for 11628
https://cdn77.scoreuniverse.com/modeldir/data/posting/11/628/posting_11628_800.jpg
✅ Found _xl for 6889
https://cdn77.scoreuniverse.com/modeldir/data/posting/6/889/posting_6889_xl.jpg