Closed ekendra-nz closed 6 years ago
I don't know why it would be difficult at all. It's been almost 3 week and no MM17 set. I would be willing to help, as it stands now mtgjson updates too slow to be relied upon.
The system currently is a static scraper that prebuilds static files to serve. Complete automation could lead to complete failure, which means no new output, if Wizard's changing something in the way we scrape or if they add a new mechanic or some other absurdity.
This is something I've been looking at but it involves an absolute rewrite so MTGJson is no longer a static file generator but stores data in a database. Which then we can cache the JSON output from for use. That will allow for a far more lenient pulling system where if things fail, the runs will still complete fine and we can fix the individual issues later. Further, this would give us an absolutely unique internal identifier to reference cards by instead of the current system which depends on data that is capable of being changed.
The current codebase is overly-complex and is written fairly poorly imho. This is because it was not built with these kind of requirements in mind. It works, but only just. So, in doing a rewrite, I'm literally starting from scratch on a project to do it. Right now it isn't public since I only have some bare basic bits working and it's been a few months since I've touched it. Once I get it where full sets can be scraped with all types of cards (except for quirks like flip/split cards) I'll open the repo up for some feedback on it.
@Garbee : Hey! It seems you and I are thinking along the same lines. A database app seems like the way to go. It would be better to just keep the rich dataset and scrape for new cards as they are released.
As far as anomalies (flip cards, multiple prints w/ same multiverseid) are concerned, that's where the community could pitch in and help.
A while back a few of us created a fork of mtgjson called mtgsqlive, which was meant to be a live database that you could generate allsets-x.json from (would also enable spoilers to be added in as soon as they were released). I feel like this would be useful for any automated attempts.
In a perfect world, all of this data would be collaboratively pruned and made available via an API for all, much like this was attempting to be: http://deckbrew.com/api/
MM17 is live since yesterday, just FYI. Also, what @Garbee said about automation is pretty much all there is to say...
In a perfect world, all of this data would be collaboratively pruned and made available via an API
The Scryfall team makes data from MTGJSON (thank you!) and other sources available via our API, and we update during spoiler season. Please check it out if you want a hand-curated source of card information that is downstream from MTGJSON:
https://scryfall.com/docs/api-overview https://scryfall.com/docs/api-methods
@csuhta What do you use for backend? php? node? ruby?
PostgreSQL and Ruby
@csuhta Mind blown. Happy days. Can't wait until I can get some time to play around with your API.
Interesting, keep us updated @Garbee ! Anyway, I hope you are a bit more flexible and free again soon @lsmoura 👍
V4 will not have the functionality to automatically run, but it should be simpler to build the system
I'm just wondering if the whole system could be automated so that the site mtgjson.com always has the newest set info as soon as the data is available on gatherer.wizards.com ?
I'm not familiar enough with the code base to see what it would take to make it happen. I'm also short on time to dedicate to it.
I'm just wondering how feasible it would be.