iftechfoundation / ifdb

The software behind the Interactive Fiction Database (IFDB)
Other
25 stars 18 forks source link

Use a validation algorithm to check IFIDs #438

Open alice-blue opened 10 months ago

alice-blue commented 10 months ago

From https://intfiction.org/t/ifdb-search-anomaly/66571/12:

Finally, as you can see, there are currently no validation rules for IFDB’s IFIDs. Reading over the Treaty, I think someone could cook up a validation algorithm. I’d accept a PR for this, but I’m probably not going to do it myself.

[The following part is about cleaning up IFIDs that have already been entered into IFDB, but I'm quoting it in case it's helpful in putting together an algorithm]

If somebody wants to clean up bogus IFIDs, I think that’d make sense. Here’s how I found them.

Initially, I queried for IFIDs that didn’t contain a hyphen, like this: select ifid from ifids where ifid not like '%-%';

That turned up 320 IFIDs, most of which looked like this:

ifid 004D9E39231A1797D3A8A58ADA7C02F4 00CC0D46A7A54FD08D89FB4652DA3CDC 0182F9777087D22739413F6B7C1896FA 01A08F7D37CB02940A1D6EAA7E57E5F9 04FC82FF2020CED57E315493435C59C1

Do those need to be “fixed up”? I’m honestly unsure. They appear to be 32-character UUIDs without their hyphens.

So then I searched for all IFIDs where the length was not exactly 32 or 36 characters long, and where the IFID does not start with any of the known, declared prefixes from the Treaty of Babel.

select ifid
from ifids
where 
ifid not like 'ADRIFT-%'
and ifid not like 'ADVSYS-%'
and ifid not like 'AGT-%'
and ifid not like 'ALAN-%'
and ifid not like 'GLULX-%'
and ifid not like 'HUGO-%'
and ifid not like 'LEVEL9-%'
and ifid not like 'MAGNETIC-%'
and IFID not like 'MZ-%'
and ifid not like 'TADS2-%'
and ifid not like 'TADS3-%'
and ifid not like 'ZCODE-%'
and length(ifid) not in (32, 36);
jtn20 commented 3 months ago

Sort-of related (but maybe wants to be its own issue):

IFDB does currently enforce that IFIDs are unique. This is usually the right thing, but I ran into one case where it makes IFDB's data less accurate, because there are duplicate IFIDs out there. (Lord knows how, but that's the real-world data.)

I ran into this when trying to enter the IFID to the latter, and getting the message Your changes were not saved ... The IFID [...] is already assigned to another game: Mousetrap. IFIDs must be unique. So, currently Labyrinth's record lacks an IFID.

Maybe soften it to a warning, with a checkbox "yes, these works really do have the same IFID"? (Unless you want to reserve it as an admin-side special case, where only a hardcoded list of IFIDs are allowed to be duplicated. Hopefully it really is quite rare.)