z33kz33k / mtg

Scrape data on MtG decks
MIT License
1 stars 0 forks source link

Scrape links to deck groups #102

Open z33kz33k opened 2 months ago

z33kz33k commented 2 months ago

This is a threefold task. The easier part would be:

  1. scraping pages that are either simple ad-hoc containers of links to individual decklist pages (like Moxfield bookmarks) or containers based around some tournament/event:

The harder, but more important part (that probably needs it's own issue) would be:

  1. scraping user decks pages. Most of decklist services support a page that holds all of the decks created by a user. Some (lazy-ass :) ) channels only paste this link and don't bother with posting individual decks. This task would be harder because it requires checking the previously scraped state (of decks scraped) and scrape only those decks featured on the user's page that haven't been scraped yet. This essentially means saving in channel's scraped data the last scraped grouped URL (update: actually the needed state is already saved (decks URLs are saved in deck's metadata)) and then passing it down to video scraping logic when needed ==> retaining the earlier state, updating it on the go and checking when needed has been implemented with 2ab3f1be4ccb3a1d65a48d1ff5fd17b75110f0ad (and 2 earlier commits).

  2. scraping sites that only feature deck groups like the official magic.gg or www.mtgo.com sites would also fall into scope of this task

  3. scraping sites that post decks in articles/blogs (and then link to them on their YT channel), e.g. Cardmarket

z33kz33k commented 3 days ago

Notes of the remainder of services to cover

Manatraders

There's tournament decks interface but it's not scrapeable because it just shows all recent tournaments without any means to filter them to only one particular event. But, there's possibility to list tournament decks via searching by a player name and hence this page could be treated as an ad-hoc user page to scrape: https://www.manatraders.com/decks?utf8=%E2%9C%93&%5Bformat_id%5D=4&%5Bsearch_name%5D=kasa&button=

Deckbox

There's a dedicated user page that could be scraped (even though it only displays one (seemingly random) of the user's folders and displaying others is only possible via human interaction)): https://deckbox.org/users/Odekar There're also dedicated event pages that could be scraped: https://deckbox.org/communities/rostov_on_don/events/2744

Cardrealm

There's dedicated user profile page that could be scraped: https://mtg.cardsrealm.com/en-us/profile/mateus-queiroz-n35/decks Profile decks can be grouped into folders and those also could be scraped: https://mtg.cardsrealm.com/en-us/decks/folder/1l7-pauper There's also a dedicated tournament page (but those stopped to be updated in 2024 July): https://mtg.cardsrealm.com/en-us/tournament/1k218-meeples-club-pauper-088

Manastack

There's a dedicated user decks page: https://manastack.com/user/kxdx1157/decks

StarCityGames

There's a dedicated tournament page: https://old.starcitygames.com/decks/SCG_CON_Standard_10K/2024-11-16_standard_Columbus_OH_0/1/