timothyrenner / nuforc_sightings_data

Data collection and processing for the National UFO Reporting Center (NUFORC) database.
MIT License
35 stars 9 forks source link

Solving recent changes at nuforc.org #22

Closed valerioa closed 5 months ago

valerioa commented 5 months ago

This commit solves several issues caused by change in page format at nuforc

  1. the HTML of all pages have been modified
  2. date format in by posted page has been changed
  3. data table has been moved in by posted page
  4. duration is not available anymore in the date index page
  5. table in page was removed. Now data is free formatted HTML

I have made a fork and checked in changes to resolve all those problems.

https://github.com/valerioa/nuforc_sightings_data

Now everything runs smoothly and data is processed correctly.

I refactored the code to adapt it to the nuforc changes. I have found a way to structure the unstructured data of the stats page. Duration is now taken from the stats page.

Data should be compatible with the previous version. The only thing I changed is the formatting of the stats column.

The stat column now is pipe delimited with and separated by a colon. This is for ease of further parsing and analysis.

<field name>:<value>|<field name>:<value>|<field name>:<value>|<field name>:<value>|

example:

"Occurred:2021-08-19 18:00:00 Local|Location:Dallas, TX, USA|Shape:Unknown|Duration:2 minutes|No of observers:2|Reported:2021-08-20 12:49:56 Pacific|Posted:2021-08-20 00:00:00|Characteristics:Lights on object, Aura or haze around object, Aircraft nearby"

timothyrenner commented 5 months ago

Thank you so much! I'll give this a whirl tomorrow or Wednesday.

timothyrenner commented 5 months ago

The parsable stats is a nice touch.

timothyrenner commented 5 months ago

Everything looks good! Thanks for your help @valerioa !