nilicule / StadiaGameDB

All games currently available on Google Stadia
https://StadiaGameDB.com
42 stars 17 forks source link

Scraping Game Image URLs #72

Open ELowry opened 3 years ago

ELowry commented 3 years ago

Hi there!

Context:

I understand this project is currently set up as a standalone system, but I was looking for a way to get game "posters" (clean images without the game title on them) that would work better for the current Stadia+ library layout...

Long story short, while digging through the html of various Stadia pages, I found out that the perfect images can be found in the All Games list page... So I built this bit of code that will gather each game's "square" and 16:9 formatted image:

var list = document.getElementsByClassName('alEDLe'), // This class is the parent for the actual list part of "game list" pages
    storage = {};

for ( l in list ) // There should only ever be one list structure, but just in case, let's go through all of them
{
    if ( list.hasOwnProperty( l ) ) // Security check
    {
        var games = list[l].children; // Corresponds to each game within the list

        for ( g in games ) // Now we go through all the games
        {
            if ( games.hasOwnProperty( g ) ) // Security check
            {
                try
                {
                    var id = games[g].getAttribute( 'data-app-id' ); // This corresponds to the individual game ID

                    if ( !storage.hasOwnProperty( id ) ) // For games with multiple editions/versions, the first one seems to always be the standard/base game; this seems to be the only way to differentiate them, so we only ever capture that one
                    {
                        var entry = {}
                        entry.square = games[g].getElementsByTagName( 'source' )[0].srcset; // The square variant of the image is stored in the picture's source srcset
                        entry.wide = games[g].getElementsByTagName( 'img' )[0].src; // the 16:9 variant of the image is stored in the picture's image src
                        storage[id] = entry;
                    }
                }
                catch ( e )
                {
                    console.error( 'Game image scraping error: ', e ); // Just to make sure this doesn't mess with code execution in case of errors
                }
            }
        }
    }
}

console.log( storage, JSON.stringify( storage ) );

Suggestion:

I believe it could be relatively easy to pull various game images from the Stadia website's code using this process.

Would you be interested in including these links to various game images within your dataset? Possibly as a separate json file, which could easily be auto-generated by an adapted version of the above code so it is easy to update?

I'd be happy to investigate and see if there are other image types/sizes that can be scraped in a similar way and get some code ready for those!

ELowry commented 3 years ago

Note: after some checking, it looks like the bit where I filter to only keep the first one doesn't work on all games (some have the "special" editions up first). I'll do a bit of checking to see if there is an easy way to filter the incorrect ones out instead.

nilicule commented 3 years ago

Note: after some checking, it looks like the bit where I filter to only keep the first one doesn't work on all games (some have the "special" editions up first).

This was an issue I ran into as well - the list is currently a bit of a mess with all kinds of games included. There's also games that seem to be grouped together by a single 'master' ID (the Hitman series, to be specific) that can cause some problems.

ELowry commented 3 years ago

OK, so I have set up a sort of "smart" filtering system based on 2 elements:

  1. Each entry is qualified as either a "Game" or "Bundle". So I first prioritize bundles.
  2. I've set up a list of filters that are used to try and determine which entries are most likely to be "editions" of the base game.

IMPORTANT: this will only work in English since I use word-based filtering. A solution to make sure things run properly would be to check if the page url is https://stadia.google.com/store/list/3?hl=en before running the code since that will ensure everything is in English.

As for Hitman, it is in fact one game (World of Assassination) with 3 "DLC" packs, of which you must at least own one to be able to access the game. I was confused too, but having looked at the list of games I can play I only see one even though I've now claimed the first 2 with Pro.

The list of filters may need to be updated from time to time, but I've set it up to be as robust as possible with what info I currently have.

Finally, here is the resulting code:


var list = document.getElementsByClassName( 'alEDLe' ),
    holding = {},
    storage = {},
    filters = {
        'game of the year': 2,
        'goty': 2,
        'ultimate edition': 5,
        'digital deluxe': 5,
        'deluxe edition': 5,
        'gold edition': 5,
        'legendary edition': 5,
        'special edition': 5,
        'premium edition': 5,
        "collector's edition": 5,
        'platinum edition': 4,
        'standard edition': 0,
        'edition': 3,
        'deluxe': 4,
        'premium': 5,
        'ultimate': 5,
        'legendary': 5,
        'super deluxe': 5,
        'platinum': 3,
        'special': 3,
        'standard': 0,
        'season': 2,
        'gold': 1,
        '-': 1
    };

for ( l in list )
{
    if ( list.hasOwnProperty( l ) )
    {
        let games = list[l].children;

        for ( g in games )
        {
            if ( games.hasOwnProperty( g ) )
            {
                try
                {
                    let id = games[g].getAttribute( 'data-app-id' );

                    if ( !holding.hasOwnProperty( id ) )
                    {
                        holding[id] = {
                            games: [],
                            bundles: []
                        };
                    }

                    let entry = { title: '' },
                        gameChildren = games[g].getElementsByTagName( 'div' ),
                        type = 'game';

                    entry.square = games[g].getElementsByTagName( 'source' )[0].srcset;
                    entry.wide = games[g].getElementsByTagName( 'img' )[0].src;

                    for ( c in gameChildren )
                    {
                        if ( gameChildren.hasOwnProperty( c ) )
                        {
                            let titleFound = false,
                                typeFound = false;
                            if ( gameChildren[c].classList.contains( 'T2oslb' ) )
                            {
                                entry.title = gameChildren[c].innerText;
                                titleFound = true;
                            } else if ( gameChildren[c].classList.contains( 'vaa0f' ) )
                            {
                                type = gameChildren[c].innerText.toLowerCase().split( ', ' );
                            }
                            if ( titleFound && typeFound )
                            {
                                break;
                            }
                        }
                    }
                    if ( ( type.length == 1 && type[0] == 'game' ) || type.length > 1 && type.includes('game') )
                    {
                        holding[id].games.push( entry );
                    }
                    else
                    {
                        holding[id].bundles.push( entry );
                    }
                }
                catch ( e )
                {
                    console.error( 'Game image scraping error: ', e );
                }
            }
        }
    }
}

for ( id in holding )
{
    if ( holding.hasOwnProperty( id ) && ( holding[id].games.length > 0 || holding[id].bundles.length > 0 ) )
    {
        if ( holding[id].length == 1 )
        {
            if ( !storage.hasOwnProperty( id ) )
            {
                storage[id] = holding[id][0];
            }
        }
        else
        {
            let filtered = FilterContents( holding[id] );
            if ( filtered && !storage.hasOwnProperty( id ) )
            {
                storage[id] = FilterContents( holding[id] );
            }
        }
    }
}

console.log( storage );
console.log( JSON.stringify( storage ) );

function FilterContents( contents )
{
    let result = false,
        tempStore = {
            0: [],
            1: [],
            2: [],
            3: [],
            4: [],
            5: []
        };

    if ( contents.games.length > 0 )
    {
        for ( g in contents.games )
        {
            let gameStored = false;
            for ( f in filters )
            {
                let regex = new RegExp( '(^|\\s|\\:)' + f + '(\\s|\\:|$)', 'i' );
                if ( regex.test( contents.games[g].title ) )
                {
                    tempStore[filters[f]].push( contents.games[g] );
                    gameStored = true;
                    break;
                }
            }
            if ( !gameStored &&  /((?<!^):\s)|([0-9]$)/.test( contents.games[g].title ) )
            {
                tempStore[1].push( contents.games[g] );
                gameStored = true;
            }
            if ( !gameStored )
            {
                tempStore[0].push( contents.games[g] );
            }
        }
    }
    else if ( contents.bundles.length > 0 )
    {
        for ( b in contents.bundles )
        {
            let gameStored = false;
            for ( f in filters )
            {
                let regex = new RegExp( '(^|\\s|\\:)' + f + '(\\s|\\:|$)', 'i' );
                if ( regex.test( contents.bundles[b].title ) )
                {
                    tempStore[filters[f]].push( contents.bundles[b] );
                    gameStored = true;
                    break;
                }
            }
            if ( !gameStored && /((?<!^):\s)|([0-9]$)/.test( contents.bundles[b].title ) )
            {
                tempStore[1].push( contents.bundles[b] );
                gameStored = true;
            }
            if ( !gameStored )
            {
                tempStore[0].push( contents.bundles[b] );
            }
        }
    }

    for ( ts in tempStore )
    {
        if ( tempStore.hasOwnProperty( ts ) && tempStore[ts].length > 0 )
        {
            result = tempStore[ts][0];
            break;
        }
    }

    return result;
}