timothyrenner / nuforc_sightings_data

Data collection and processing for the National UFO Reporting Center (NUFORC) database.
MIT License
35 stars 9 forks source link

Scrapy is not spidering the reports #18

Closed briangnj closed 1 year ago

briangnj commented 1 year ago

Scrapy closes the spider right after http://www.nuforc.org/webreports/ndxpost.html is crawled. /data/raw/nuforc_reports_orig.json and data/raw/nuforc_reports_new.json are both empty and then the script exits

timothyrenner commented 1 year ago

Looks like they've renovated their site a little bit and the scraper needs to be updated. I'll try to take a look at this soon (super busy at the moment) but if I'm not fast enough feel free to open a PR.

briangnj commented 1 year ago

It looks like a new field was added (country) and they changed the report table formatting. I fixed it, I'll open a PR when I get a free moment

tsepton commented 1 year ago

Hi @briangnj,

I do encounter the same issue (obviously), did you take the time to make that PR ?

tsepton commented 1 year ago

Nevermind, I had some spare time so I did it myself

briangnj commented 1 year ago

Sorry I didn't have a moment as I've been traveling. Thanks for updating the code

On Thu, Aug 18, 2022, 5:56 AM Thibaut @.***> wrote:

Nevermind, I had some spare time so I did it myself

— Reply to this email directly, view it on GitHub https://github.com/timothyrenner/nuforc_sightings_data/issues/18#issuecomment-1219402372, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOMMOKFSHUDYRVMCGBSKPLVZYQGFANCNFSM556ZSNPQ . You are receiving this because you were mentioned.Message ID: @.***>

timothyrenner commented 1 year ago

I've merged the PR, going to close. Thanks again @tsepton for the PR and thank you @briangnj for pointing this out!