mtgjson / mtgjson3

MTGJSON repository for Magic Cards
http://mtgjson.com
Other
548 stars 102 forks source link

Add full foreign language support #40

Closed Sembiance closed 6 years ago

Sembiance commented 9 years ago

Add full foreign language scrapes.

Implementation info: Need to use ?printed=true on Gatherer to get this data. If the cards have unique multiverseids probably have totally seperate json. Files like AllCards-x-ch.json or what not for each language. When doing this, keep the 'foreignNames' field, and make sure it includes 'English' in non-english printings

Sembiance commented 9 years ago

Example of a gatherer entry in another language: http://gatherer.wizards.com/Pages/Card/Details.aspx?multiverseid=398615 vs http://gatherer.wizards.com/Pages/Card/Details.aspx?multiverseid=400000

Sembiance commented 9 years ago

Note, currently the foreignNames field isn't always working correctly for 'split' cards

florentdouine commented 9 years ago

@Sembiance What do you think about this presentation :

{
"translations":{
"en":{
"multiverse_id": 398615,
"name": "Tower Geist",
"text": "FlyingWhen Tower..."
},
"fr":{
"multiverse_id": 399723,
"name": "Geist de la tour",
"text": "..."
}
}
Lumrenion commented 9 years ago

As I miss the multiverse_id in the allCards-x.json entirely, I like @florentdouine's suggestion. As the multiverse_id differs through the sets aswell, how much overhead would it produce to make the multiverse_id an array that holds all possible multiverse_id's of a card? Then it would look like this:

{
    "translations":{
        "en":{
            "multiverse_id":{
            "DKA": "262665",
            "ORI": "398615",
        },
        "name": "Tower Geist",
        "text": "FlyingWhen Tower..."
    },
    "fr":{
        "multiverse_id":{
            "DKA": "337357",
            "ORI": "400000",
        },
        "name": "Geist de la tour",
        "text": "..."
    }
}
Sembiance commented 9 years ago

Note: This issue will track adding the full foreign language text for all parts of the card. Issue #12 will track just the feature of adding foreign language multiverseid's without the foreign text.

Sembiance commented 9 years ago

Note: When the day comes to add foreign languages, I'll need to incorporate the language into the computed id of the card because some cards will have identical names to the english versions. To do this I'll likely change the id from sha1(setCode + cardName + imageName) to sha1(setCode + cardName + imageName + language) and to keep existing cards with the same exact id, english sets will use an empty string for the language so that current id's don't change.

MickHardins commented 9 years ago

magicccards.info has all the foreign names, maybe you can fetch data from there? (not sure if this is a possibility, just giving an idea)

florentdouine commented 8 years ago

I'm waiting for this feature for long time. Is someone working on it ?

lsmoura commented 8 years ago

@florentdouine I'm working on this.

Currently i'm only updating the sets on the git branch, while the mtgjson.com files remain unchanged (for now). You can check the 'multilang' branch.

Once I get to a point I'm satisfied, I'll merge the branch with master and think about a way to publish them on the mtgjson website.

florentdouine commented 8 years ago

Great news ! Thanks @lsmoura

ynerant commented 8 years ago

I saw you're working on translations and I'm greating you. But I saw a little problem : the original type of french cards is incorrect. We get Créature : — eldrazi, and we should get Créature : eldrazi. I see that's a problem that appears on Gatherer, but I think it should be better. Otherwise very good work.

tooomm commented 8 years ago

@galaxyoyo http://gatherer.wizards.com/Pages/Card/Languages.aspx?multiverseid=402079 You are right, printed french cards seems to be the only ones with a : as separator, all other languages use - But interestingly the Oracle text for the french cards is Créature légendaire : - eldrazi compared to Créature légendaire : eldrazi on the printed one.

Another example: http://gatherer.wizards.com/Pages/Card/Details.aspx?printed=true&multiverseid=387742 You can compare the gatherer text next to the picture. Both are different, and french is the only languages which uses : to separate the main card type from the subtype.

It looks like the prints for french are wrong/different. Can you think of any reason why Wizards is doing so @galaxyoyo? Languages wise...

So the correct one is supposed to be Créature légendaire - eldrazi I guess? (or Eldrazi?)

ynerant commented 8 years ago

Printed french cards have Créature légendaire : eldrazi (I'm french and have some french cards), so I think the correct one is with only :. There is also the problem in tradiational and simplified chinese, whose separator is ~ -, but printed with only . I don't know if it is better to keep the - as Gatherer already do, or to make the real printed separator ...

tooomm commented 8 years ago

I understand what you're saying, I also own french cards... anyway, that's not my point. No matter how it's printed, it should be uniform and correct in "our database", which means it should be the same as it is for all other languages. As long as nobody comes up with a explanation why they might printed these separators in french differently for a reason (as only language!).

Am I the only one that things the goal should be to improve what wizards delivers here? mtgjson already fixes other misprints as well. I guess the idea is not to have a digital collection of paper card clones, but a comprehensive and well-done database.

ynerant commented 8 years ago

I think you're right ... The gatherer's database is sometimes so crazy ... But I think mtgjson has to be the clone of this database, and if people wants to correct somethings, he has to do these fixes himself or to report the problem to WotC. Maybe there is a not crazy reason for leaving the -, or to remove all endlines in translated cards in some ...

ZeldaZach commented 6 years ago

I'm looking at this and thinking of the best way to handle it.

"foreignNames": [
                {
                    "language": "Chinese Traditional",
                    "multiverseid": 272475,
                    "name": "修道院獅鷲",
                    "text": "...",
                    "type": "..."
                },
                {
                    "language": "German",
                    "multiverseid": 273611,
                    "name": "Abteigreif",
                    "text": "...",
                    "type": "..."
                },
                ...
]

Is how i plan on handling this

tooomm commented 6 years ago

Shouldn't it be foreignData instead "names"? Or translatedData?

ZeldaZach commented 6 years ago

I'll make it the former

ZeldaZach commented 6 years ago

Addressed in https://github.com/mtgjson/mtgjson-python/commit/54398f4edb959fba9a6a92331daef596cd97f9be

tooomm commented 6 years ago

Is language name the long version? And it should have "flavorText' or "flavor" too 👍