osrsbox / osrsbox-db

A complete and up-to-date database of Old School Runescape (OSRS) items, monsters and prayers
https://www.osrsbox.com/projects/osrsbox-db/
GNU General Public License v3.0
223 stars 79 forks source link

Author JSON schema for quest data #115

Closed osrsbox closed 5 years ago

osrsbox commented 5 years ago

As outlined in #113 - we are looking at adding quest data to the orsrbsox-db project. To start it is essential to specify a schema for the data to be collected and stored. Similar to the item_schema.json, a new JSON schema for quest data should be added to help the building process and testing of the OSRS quests data.

gc commented 5 years ago

Here is a quick draft I made with a reasonable schema layout. (its not in the actual correct JSON schema layout, I haven't checked out how that is)

{
    "id": "number",
    "name": "string",
    "type": "quest | miniquest",
    "difficulty": "novice | intermediate | experienced | master | grandmaster | special",
    "length": "very_short | short | short_long | short_medium | medium | medium_long | long | long_very_long | very_long",
    "series": "string[]",
    "description": "string",
    "image": "string",
    "trivia": "string[]",
    "freeToPlay": "boolean",
    "guide": "string",
    "quick_guide": "string",
    "developer": "string",
    "release_date": "string | date (date probably better)",
    "required_for_completing": "number[] (array of quest IDs)",
    "requirements": {
        "questpoints": "number",
        "skills": {
            "attack": "number (level)",
            "farming": "number (level)"
        },
        "items": {
            "item_id": "quantity"
        },
        "quests": "number[] (array of quest id's)"
    },
    "rewards": {
        "questpoints": "number",
        "skills": {
            "attack": "number (XP)",
            "farming": "number (XP)"
        },
        "gp": "number",
        "items": {
            "item_id": "quantity"
        },
        "misc": "string[] (any random items/rewards that dont fit into the above"
    }
}
osrsbox commented 5 years ago

Hey @gc - many thanks for the contribution! This is pretty much what I was thinking about for the JSON contents/schema. Thanks for putting in the effort to do this, and good to see someone else is on the same page on what should be in the JSON structure.

I will translate this into the JSON schema format. I will also write a new pytest module to test the data using the JSON schema (similar to test_items_database.py). But I will disable it by default until the data is populated.

Note for future me: you can disable a pytest module using the skip decorator. Example is added below.

@pytest.mark.skip(reason="Skipped: The quest data is not currently populated.")
pmauldin commented 5 years ago

Just looking at an example wiki entry (https://oldschool.runescape.wiki/w/Dragon_Slayer) to see what else is available. The two fields in the "Details" section that I think aren't represented in the proposed schema are the following:

{
    ...,
    "startPoint": "string",
    "enemiesToDefeat": [
        {
            "id": "number",
            "name": "string",
            "level": "number",
            "optional": "boolean"
        }
    ]
}

Depending on the status of the monster database (#16), enemiesToDefeat could probably just be an array of ids used to query that db.

Zirro commented 5 years ago

It would be helpful if the schema could distinguish between "required", "strongly recommended" and "recommended" items and skills. Perhaps also providing the reason from the wiki entry - (Significantly lowers Elvargs dragonfire attack damage for Antifire potions) as a string if one exists.

Whether or not a skill can be boosted to reach a requirement might be useful to cover as well.

osrsbox commented 5 years ago

I have been doing some coding/development into the quests JSON API and general research into the data that is required to be stored. There are a number of issues I have so far encountered, and this is a bit of a dump to keep others informed. I have included a small draft of a JSON object that represents the Dragon Slayer II quest - but it is only partially populated. This should hopefully provide a little example of the desired output. @pmauldin has been working on the quest API, and builder in his quests-builder branch.

  1. It might be useful to use the quest release number for the id property.
  2. Nesting may make the Python API somewhat messy, so I think the rewards section should be divided into top-level sections: rewards_items, rewards_quest_points. The same goes for other sections.
  3. Item requirements should defiantly be divided into requirements, strongly recommended requirements and recommended requirements (as stated by @Zirro). Maybe even ironman requirements? A description of use could also be good.
  4. Item requirements are difficult to populate for some items... For example, the quest guide says a pickaxe which could be any pickaxe from bronze to dragon. There could be an array/list of item IDs? But this would be difficult to populate automatically from the OSRS Wiki data.
  5. The required_for_completing property might be difficult to populate automatically from the wiki data.
  6. I think that the requirements_quests property should only have the top-level quest required, then all of those quests can be programmatically queried for recursive quest requirements. For example, Dragon Slayer II requires the Legends Quests so that is included in the DSII JSON data. Then you recursively query the Legends quest for the quests required to complete that quest (e.g., Heroes quest) and so on.
  7. Some of the properties are going to be very difficult to automate extraction of wiki data, and population. I really like making automated tools so that the database can be updated... but this project might require some manual tweaking... thoughts from anyone?
{
    "id": 136,
    "name": "Dragon Slayer 2",
    "type": "quest",
    "members": true,
    "release_date": "4 January 2018",
    "series": ["Dragonkin"],
    "developer": "Mod Ed",
    "start_point": "Speak to Alec Kincade outside the Myths' Guild.",
    "difficulty": "grandmaster",
    "length": "very_long",
    "description": "Some long winded description of the quest.",
    "url_quest_guide_full": "https://oldschool.runescape.wiki/w/Dragon_Slayer_II",
    "url_quest_guide_quick": "https://oldschool.runescape.wiki/w/Dragon_Slayer_II/Quick_guide",
    "required_for_completing": null,
    "requirements_quest_points": 200,
    "requirements_quests": [
        50,
        126,
        87,
        117,
        71,
        132,
        131
    ],
    "requirements_skills": {
        "magic": {
            "level": 75,
            "boostable": false
        },
        "smithing": {
            "level": 70,
            "boostable": false
        },
        "mining": {
            "level": 68,
            "boostable": false
        }
    },
    "requirements_items": {
        "11920": 1,
        "6739": 1,
        "8778": 8
    },
    "requirements_items_recommended": {
        "12625": 4,
        "12931": 1
    },
    "enemies_to_defeat": [
        8059,
        8056,
        8057
    ],
    "rewards_quest_points": 5,
    "rewards_coins": 0,
    "rewards_items": {
        "21880": 15,
        "21892": 1
    },
    "rewards_skills": {
        "smithing": 25000,
        "mining": 18000,
        "agility": 15000,
        "thieving": 15000
    },
    "rewards_misc": [
        "Ability to speak to cats without the Catspeak amulet.",
        "Access to the Myths' Guild",
        "Lots of other *stuff*"
    ]
}
gc commented 5 years ago

I'm not sure I understand the reasoning behind not nesting stuff? Is it because it looks messier? I'm not so sure about that, people will be spending more time using it than manually reading the JSON, and having it stored in the more obvious nested way seems better to me.

Also should "required_for_completing": null, be [] instead of null? Always good to avoid having/using null, and this way we can always assume that its an array, e.g. consider the data actually being used, and this causing an error if its null, but not if its an empty array: required_for_completing.map(console.log);

Should release_date be a timestamp rather than a string? Referring back to the discussions before about us having to parse that (e.g. say we want to calculate how long its been since the quest has been released using the release date).

I also think having to manually fill in some data is cool and I'd be down to help out, after its all collected once I don't assume we'd have to recollect data for old quests.

Looking good! Happy to see progress.

Zirro commented 5 years ago

@osrsbox Looks good to me! After going through the data in @pmauldin's branch, one more suggestion I would make is to turn the developer field into a developers array of strings instead since certain quests lists multiple people. Otherwise we end up with commas - Chris S, Jonathan S - in these cases.

osrsbox commented 5 years ago

Just an update for this issue. I have authored a JSON schema based on the discussion of this thread. There is a new branch called quest-schema with a pytest module to validate future data. The JSON files must be in the docs/quests-json folder, or you can use an online JSON Schema Validator. Have a look at commit d5e2fec9a252f099c097bd8bc7b95a4e0d561e93 for the JSON schema and associated test.

Probably more useful, I have attached a revised version of the Dragon Slayer II JSON file that is similar to the previous comment I made. I have modified the structure based on feedback. I went back to the nested structure as discussed with @gc - thanks for the feedback on this. I also fixed the release_date issue and the empty array for the required_for_completing property. I also added an array for the developer field based on comment from @Zirro, but forgot to change the property name to developers (I will fix in the future). I also changed a couple object/array style for items etc. mainly to ease JSON schema development - happy to hear feedback on these.

Thanks to @gc, @Zirro and @pmauldin for providing feedback on this issue. It is great to bounch ideas around in the initial development stages.

Example JSON provided below:

{
    "id": 136,
    "name": "Dragon Slayer 2",
    "type": "quest",
    "members": true,
    "release_date": "2018-01-04",
    "series": [ "Dragonkin" ],
    "developer": [ "Mod Ed" ],
    "start_point": "Speak to Alec Kincade outside the Myths' Guild.",
    "difficulty": "grandmaster",
    "length": "very_long",
    "description": "Some long winded description of the quest.",
    "url_quest_guide_full": "https://oldschool.runescape.wiki/w/Dragon_Slayer_II",
    "url_quest_guide_quick": "https://oldschool.runescape.wiki/w/Dragon_Slayer_II/Quick_guide",
    "required_for_completing": [],
    "requirements": {
        "quest_points": 200,
        "quests": [
            50,
            126,
            87,
            117,
            71,
            132,
            131
        ],
        "skills": [
            {  
                "skill": "magic",
                "level": 75,
                "boostable": false
            },
            {
                "skill": "smithing",
                "level": 70,
                "boostable": false
            },
            {
                "skill": "mining",
                "level": 68,
                "boostable": false
            }
        ],
        "items_required": [
            {
                "id": 11920,
                "quantity": 1
            },
            {
                "id": 6739,
                "quantity": 1
            },
            {
                "id": 8778,
                "quantity": 8
            }
        ],
        "items_recommended": [
            {
                "id": 12625,
                "quantity": 4
            },
            {
                "id": 12931,
                "quantity": 1
            }
        ],
        "enemies_to_defeat": [
            8059,
            8056,
            8057
        ]
    },
    "rewards": {
        "quest_points": 5,
        "coins": 0,
        "items": [
            {
                "id": 21880,
                "quantity": 15
            },
            {
                "id": 21892,
                "quantity": 1
            }
        ],
        "skills": [
            {  
                "skill": "smithing",
                "xp": 25000
            },
            {
                "skill": "mining",
                "xp": 18000
            },
            {
                "skill": "agility",
                "xp": 15000
            }
        ],
        "misc": [
            "Ability to speak to cats without the Catspeak amulet.",
            "Access to the Myths' Guild.",
            "Lots of other *stuff*"
        ]
    }
}
gc commented 5 years ago

I agree with Zirro's suggestion, seems smart for it to be developers: ['Mod Ed', 'Mod X'] rather than developer: 'Mod Ed, Mod X'.

Also I noticed the image and trivia keys weren't added to the schema, was that intentional?

pmauldin commented 5 years ago

New schema looks great! One thing I did want to mention: on my branch, I initially wanted to use the quest release number as the id. However, I found that miniquests don't actually have a release number (at least, as far as I could tell). So we'd have to have some way of dealing with those. I definitely agree with @gc about preferring nested data over flattening.

Where do we go from here? Should I update my branch to reflect this new schema, and try to get the automated collection into as good of a state as possible, and then we can turn to manual tweaking of the things that weren't parsed well?

Edit: One other thing. In the extracted wiki text , there are entries for both "Tears of Guthix" and "Tears of Guthix (quest)". I assume the former is meant to be for the minigame; however, the content of both is identical and is the wiki page for the quest.

osrsbox commented 5 years ago

@gc I have changed the JSON schema to developers as requested, and it should only accept an array of strings. I forgot about the other two properties, so I just added them now. I included theurl_quest_image property to have a URL to the image, I thought this was the best solution - as I will not host the OSRS Wiki images. I also added quest_trivia as an array of strings - which seemed the most logical.

@pmauldin to answer your questions:

  1. Nice work noticing that miniquests don't have an ID. I really like the integer ID scheme. Not sure what to do with the miniquests then... maybe we could manually number them: M1 for miniquest 1 and then give them a number using the release date to specify ordering? Or something similar? With a alpha character it would break the schema - but that could be modified. It seems overboard re-numbering all the quests by release date, and then it wouldn't align with the wiki. You have any better ideas? I am at a loss!
  2. Future development: Yes, I think updating the code to match the schema would be the next logical step. As we all discussed further up in this thread, I think some manual data entry/checking is logical. Quests are far more static than items, so this seems manageable. Happy to help where I can with this (API code and manual work).
  3. Tears of Guthix: Unsure why this is appearing. There is not even a page on the wiki with that specific name. I get the data by using the mediawiki API and querying the quests, miniquests and special quests categories. I will have a look at this soon to try and get rid of the extra data.
gc commented 5 years ago

All sounds good.

What I had in mind for the id attribute was a arbitrary number based on chronological order (e.g. sort all the quests by chronological order, then give the first released an id of 1, and incrementing up until the most recent quest). This would include miniquests and whatnot.