osrsbox / osrsbox-db

A complete and up-to-date database of Old School Runescape (OSRS) items, monsters and prayers
https://www.osrsbox.com/projects/osrsbox-db/
GNU General Public License v3.0
223 stars 79 forks source link

News Articles #127

Open gc opened 5 years ago

gc commented 5 years ago

I suggest news articles are added.

Here is my TS typing (for the schema) that I'm using:

export interface NewsItem {
    title: string;
    link: string;
    image?: string;
    category: string;
    month: number;
    year: number;
    day: number;
    timestamp: number;
}

I include month/year/day for convenience, but they can be calculated from the timestamp, so I'm not sure whether or not those should be included in the data. Some have no images, some have no content.

As for the actual article text itself, I'm not sure whether or not that should be included in osrsbox. It could similar to the items, where the resource with every news item in 1 file wont have the text, but if youre fetching 1 specific article/month, it will.

If its helpful, heres:

my code for scraping them: https://github.com/gc/oldschooljs/blob/master/src/lib/Structures/News.ts

my scraped data: https://github.com/gc/oldschooljs/blob/master/src/data/news/news_archive.json

Also, with regards to scraping, it seems to ratelimit you after around 40 page visits per hour, and that ratelimit is lifted within around an hour (maybe less), so just something to keep in mind for your big initial scrape of all the old articles.

osrsbox commented 5 years ago

@gc - I quite like this idea. This project is all about providing easily parse-able OSRS-related data, and it makes logical sense to add news posts to the current data available. Thanks for providing the schema example, and the JS code example. And most importantly the rate limiting information - that is exceptionally useful. As a side note - it might be better to source the raw text from the OSRS Wiki, to avoid such intense rate limiting. Would have to compare the raw text coverage between each source to ensure the full data is available. I am also not sure if the OSRS Wiki has all news posts, or just the weekly update news posts. Will have to investigate.

Unfortunately, with the large number of things on the development list it might be a while before I can start looking into this. I have some free time at the moment for some development, but some fine tuning to the item database, and the addition of the quest and monster database will probably take precedence.