mtgjson / mtgjson3

MTGJSON repository for Magic Cards
http://mtgjson.com
Other
548 stars 102 forks source link

ids appear to be unstable across releases #177

Closed gwax closed 6 years ago

gwax commented 8 years ago

I'm running into an issue of data incompatibility from the previous mtgjson release to the current one. It appears as though some id fields have changed from the previous version to the current one.

The first one that I've run into (my program exceptions out so there might be more) is Carl Critchlow Wasteland from pJGP (number 55).

In 3.3.14, the id for the Wasteland was: 223a234e1ea29295a4403f55e4edb2ca54bfd64e

In 3.3.15, the id for the Wasteland is: 11cc7debf6b7607b612bf1a903fde946d5d7db03

Is this a bug, repairing a bug, or should we not trust ids to be stable across releases?

lsmoura commented 8 years ago

Hello

Originally, MTGJson code was conceived on a philosophy that any given set would have just a static selection of cards. Therefore, the image names are calculates based on the number of cards with the same name on the same set.

When 3.3.14 was released, there was only one "Wasteland" in 'pJGP' set, therefore, the image was named "wasteland".

Once 3.3.15 was released, the 'pJGP' set had TWO cards named "Wasteland", so, they got named "wasteland1" and "wasteland2". Since the imageName is used when calculating the id, the id got changed.

Can you make a gist with all the images that you found like this, so I can make a deeper check?

Just so you know, in this release, I've started to store the json files on its respective version folder, so, you can find the "old" version files at http://mtgjson.com/json-3.3.14/

I'm not sure about how I'll proceed to fix this for this one and in the future. One way is to change the way that imageName is calculated so that images were named like "wasteland", "wasteland2", "wasteland3"... or change the way ID is calculated to NOT include imageName, but rather "set" and "number", but not all card have numbers... Either way, it would change the IDs for many many cards...

I believe I should start making a list of changes to release the "4.0" version...

gwax commented 8 years ago

I'll dig into it when I get some time.

I figured it might be something like that, where the source data feeding into the ids column changed.

As a consumer of the data, I can live with situations like this but would love to see a note in the changelog to the effect of: pJGP - Wasteland id has changed for every id that changes.

For moving to 4.0, I would recommend keeping the existing id calculation and adding a new system, like id2 to maintain backward compatibility.

Prior to the introduction of ids in mtgjson, I was using a tuple of (set, name, number, multiverseid) as my keying system. You could consider something like make a dict of set, name, number, multiverseid; serialize to json, using null for missing values; and take the md5 of that chunk of data as the id.

ZeldaZach commented 6 years ago

In v4 it will be card name + set name + mid, all consts

fenhl commented 6 years ago

Card names are not constant. Remember Kaladesh?

ZeldaZach commented 6 years ago

They changed Ae, correct? That was a change long coming. They use only ascii letters now, so I see no future changes to card names

fenhl commented 6 years ago

That's not true, there are quite a few cards that still use non-ASCII characters, such as El-Hajjâj.

tooomm commented 6 years ago

Wasn't that the reason to use a uuid in the first v4 try?

Persistent immutable ID, generated for each card, stored on "_id" field.

(from https://github.com/mtgjson/mtgjson4_temporary#new-features-and-changes)

https://github.com/mtgjson/mtgjson4_temporary/pull/10

ZeldaZach commented 6 years ago

@fenhl Even with that, I think using the card name is a fine approach for the time being. If they do decide to change cards again, such as the one you gave as an example, we can release an addendum to the release notes informing the end-user about the changes.

gwax commented 6 years ago

There have been a bunch of cases where generating an id from card data is inconsistent. A lot of the time it's been because of bugs but some of the time it's been because the underlying data changes. Infrequently, but sometimes, it's because WotC introduces some new way of varying cards that breaks how we think about things (see Unstable (though they fixed that later)).

I really strong feel that affixing an id to a card the first time we see it and persisting that state is the safest way to move forward.

Additionally, given end users can generate any hash we might consider generating, why not let users generate the hash they want.