kc9eye / GalnetArchive

An XML archive of Galnet and an Archiving tool.
MIT License
2 stars 0 forks source link

[Enhancement] Probably shouldn't scrape the website, I found a jsonapi #3

Closed Andrew-J-Larson closed 3 years ago

Andrew-J-Larson commented 3 years ago

The link is: https://cms.elitedangerous.com/jsonapi/node/galnet_article

The only unfortunate thing is it's paginated and caped at 50 articles per page.

The best way to navigate each page is like so: (not in order by date unfortunately)

But since it's in a lot easier format to use, and less likely to change than a html page, it would probably be best to use this.

Andrew-J-Larson commented 3 years ago

Another issue I've noticed though is that the dates past a certain date are off by one day... which I'm not sure why that is, when using the api.

kc9eye commented 3 years ago

This is due to FDev using a real leap year date, when the fictional year they are representing is not a leap year.

Andrew-J-Larson commented 3 years ago

Ah that would make a lot of sense then.

Andrew-J-Larson commented 3 years ago

Looking at it further, it turns out that it's not really sorted by date, sadly. So you'd have to collect all the parts of the whole archive first before trying to sort.

Andrew-J-Larson commented 3 years ago

On the plus side though, it would reduce your constant fetching of the website down to 7 fetches, at least for now, per check for new posts.

Andrew-J-Larson commented 3 years ago

@kc9eye Also, this is what I mean. http://cms.elitedangerous.com/jsonapi/node/galnet_article?page[offset]=0&filter[field_galnet_date]=22%20AUG%203304

But that shows the posts that should be on 23 AUG 3304.

kc9eye commented 3 years ago

Perhaps you should talk to the owner of that site, can't help you with that.

Andrew-J-Larson commented 3 years ago

I've tried Frontier Dev's contact email, but it keeps erroring for some weird reason, so instead I setup a forum account, and made a note about such an issue inside their normal api.

Andrew-J-Larson commented 3 years ago

I talked to other third party devs that had captures of those days when articles came out, and it turns out that the date in the api should be correct, meaning that the archive website dates may be wrong for some articles... hopefully the devs will notice it sometime and fix it.