Open z33kz33k opened 2 months ago
There's tournament decks interface but it's not scrapeable because it just shows all recent tournaments without any means to filter them to only one particular event. But, there's possibility to list tournament decks via searching by a player name and hence this page could be treated as an ad-hoc user page to scrape: https://www.manatraders.com/decks?utf8=%E2%9C%93&%5Bformat_id%5D=4&%5Bsearch_name%5D=kasa&button=
There's a dedicated user page that could be scraped (even though it only displays one (seemingly random) of the user's folders and displaying others is only possible via human interaction)): https://deckbox.org/users/Odekar There're also dedicated event pages that could be scraped: https://deckbox.org/communities/rostov_on_don/events/2744
There's dedicated user profile page that could be scraped: https://mtg.cardsrealm.com/en-us/profile/mateus-queiroz-n35/decks Profile decks can be grouped into folders and those also could be scraped: https://mtg.cardsrealm.com/en-us/decks/folder/1l7-pauper There's also a dedicated tournament page (but those stopped to be updated in 2024 July): https://mtg.cardsrealm.com/en-us/tournament/1k218-meeples-club-pauper-088
There's a dedicated user decks page: https://manastack.com/user/kxdx1157/decks
There's a dedicated tournament page: https://old.starcitygames.com/decks/SCG_CON_Standard_10K/2024-11-16_standard_Columbus_OH_0/1/
This is a threefold task. The easier part would be:
moxfield.com/bookmarks/
: (#144)mtgtop8.com/event?e=
: (#152)archidekt.com/folders/
: (#153)www.hareruyamtg.com/en/deck/result?eventName={event_name}
(#154)aetherhub.com/Events/{format}
melee.gg/Tournament/View/{event_id}
topdeck.gg/bracket/{event_id}
: (Moxfield decklists) (#162)tappedout.net/users/{user}/deck-folders/
: (#166)tappedout.net/mtg-deck-folders/{folder}/
(#167)www.magic-ville.com/fr/decks/decklists?event={event_id}
(#172)www.mtggoldfish.com/tournament/{tournament_name}
(#174)www.tcdecks.net/deck.php?id={event_id}
(#177 )infinite.tcgplayer.com/magic-the-gathering/events/event/{event_name}
melee.gg/Tournament/View/116326
(#186)mtgdecks.net/{format}/{tournament_name}-tournament-{tournament_id}
(#188)pennydreadfulmagic.com/competitions/{competition_id}
: (#192)The harder, but more important part (that probably needs it's own issue) would be:
scraping user decks pages. Most of decklist services support a page that holds all of the decks created by a user. Some (lazy-ass :) ) channels only paste this link and don't bother with posting individual decks. This task would be harder because it requires checking the previously scraped state (of decks scraped) and scrape only those decks featured on the user's page that haven't been scraped yet. This essentially means saving in channel's scraped data the last scraped grouped URL (update: actually the needed state is already saved (decks URLs are saved in deck's metadata)) and then passing it down to video scraping logic when needed ==> retaining the earlier state, updating it on the go and checking when needed has been implemented with 2ab3f1be4ccb3a1d65a48d1ff5fd17b75110f0ad (and 2 earlier commits).
moxfield.com/users/
: (#158)streamdecker.com/decks/
: (#160)mtga.untapped.gg/profile/
(without/deck/
): (#180)archidekt.com/u/{user_name}
orhttps://archidekt.com/user/{user_id}
orarchidekt.com/search/decks?owner={user_name}
: (#157)tappedout.net/users/
: (#165)aetherhub.com/User/
: (#156)decks.tcgplayer.com/magic/deck/search?player={player}
:app.cardboard.live/s/{user_name}
deckstats.net/decks/{user_id}/
: (#169)magic-ville.com/fr/register/perso?user={user_name}&rub=decks
: (#173)www.mtggoldfish.com/deck_searches/create?utf8=%E2%9C%93(...)&deck_search%5Bplayer%5D=Eliott_Dragon(...)
: (#178)infinite.tcgplayer.com/magic-the-gathering/decks/player/{user_name}
(#182)infinite.tcgplayer.com/magic-the-gathering/decks/advanced-search?author={user_name}&p=1
(#183)www.hareruyamtg.com/en/deck/result?player={player_name}
(#190)flexslot.gg/u/{user_name}
: (#191)scraping sites that only feature deck groups like the official
magic.gg
orwww.mtgo.com
sites would also fall into scope of this taskscraping sites that post decks in articles/blogs (and then link to them on their YT channel), e.g. Cardmarket