hackgvl / OpenData

Open data projects, including real-time and reusable data for local tech meetups, events, and map layers.
16 stars 3 forks source link

Data Validator Scripts for Map Layer GeoJSON #50

Open allella opened 5 years ago

allella commented 5 years ago

Errors in the map layer spreadsheets could lead to invalid/broken GeoJSON URLs.

Hence, we could use a script that can run as a scheduled / cron job that validates the GeoJSON files and sends an email to an admin if anything is wrong.

Validation Ideas

allella commented 5 years ago

@MarkMcDaniels here's one of the ideas for validating the map layer GeoJSON in the event you're looking for a project. It's more of a backend thing but I'm posting it here for future reference.

It's possible to get a list of all the map layers and loop through them https://github.com/codeforgreenville/OpenData/issues/17 and then get the GeoJSON url for each map, which can then be run through a series of checks.

MarkMcDaniels commented 5 years ago

@allella I finished the validator, but I'm not sure where to commit my changes for a pull request.

allella commented 5 years ago

@MarkMcDaniels sounds great.

For now, see if you can clone this repo and create a directory like, /validation-scripts and drop your code in there and do a pull request. Or, you may be able to push directly into the repo.

Then, I'll take a look and see where we can host/run it.

I'll be out of town for the Tuesday "talking" meeting, but should be back for the July 2nd work night.

allella commented 4 years ago

@MarkMcDaniels CFG is participating in a virtual National Day of Civic Hacking on Sept 12th. We're going to work to bring the open map layer project to the next level since some of our map layers do / will address the event's theme of "social safety net services during COVID-19".

Are you available and able to lead an effort to get this validation script tested / improved / or otherwise moved forward?

The core of the event would be a 2-3 hour commitment with some people attending the national kick-off event and anybody who wants to stay later can do so as well.

We'd be glad to get the band back together for this event.

Thanks, Jim

MarkMcDaniels commented 4 years ago

Jim, Yes I'll be there. Is this going to be in person or virtual?

Mark

On Fri, Aug 28, 2020 at 5:33 PM Jim Ciallella notifications@github.com wrote:

@MarkMcDaniels https://github.com/MarkMcDaniels CFG is participating in a virtual National Day of Civic Hacking on Sept 12th https://www.meetup.com/Code-for-Greenville/events/272770100/. We're going to work to bring the open map layer project to the next level since some of our map layers do / will address the event's theme https://www.codeforamerica.org/events/national-day-of-civic-hacking-2020 of "social safety net services during COVID-19".

Are you available and able to lead an effort to get this validation script tested / improved / or otherwise moved forward?

The core of the event would be a 2-3 hour commitment with some people attending the national kick-off event and anybody who wants to stay later can do so as well.

We'd be glad to get the band back together for this event.

Thanks, Jim

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/codeforgreenville/OpenData/issues/50#issuecomment-683156359, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7E3A6UBAYIWUV3SHMCWBLSDAPDZANCNFSM4HLNIYUA .

allella commented 4 years ago

@MarkMcDaniels it's virutal.

You can RSVP here https://www.meetup.com/Code-for-Greenville/events/272770100/ and register with Code For America here https://www.codeforamerica.org/events/national-day-of-civic-hacking-2020

We'll have a Zoom link closer to the event date.

Thanks, Jim

allella commented 4 years ago

https://github.com/codeforgreenville/OpenData/blob/master/validator-script/pontValidator.php

MarkMcDaniels commented 4 years ago

@allella This validator is broken up into two lists based on data patterns they currently have. During #ndofch as we combine layers, these lists will have to be updated.

allella commented 4 years ago

Perhaps we can rework the script so it gets a list of the map layers dynamically. We have a URL with the links, I just need to dig it up. Does that sound good?

allella commented 4 years ago

https://data.openupstate.org/rest/maps?_format=json

MarkMcDaniels commented 4 years ago

Yes I can start that today. Because it's patterned based, I'll have to rewrite it to automate pattern type choices.

Take two: After looking at how the api is delivering the data I realized that I was using a different endpoint. This is where the patterns were different. The cool thing is that /maps?_format=json has abstracted the coords already. This means I have to rewrite it completely, but at the end it can be used for by anyone using this setup.

MarkMcDaniels commented 4 years ago

I think I've come up with an easier solution. If the data is being held in excel or google sheets, we could limit the cells for lat and long to only accept a range of values within our geographic area.

allella commented 4 years ago

Alright.

The range is a useful feature, but the primary hope with validation is that the GeoJSON is not caused to be malformed due to some error in the spreadsheet.

An out of range incorrect lat / long probably doesn't break the GeoJSON, but it would be nice to flag that problem.

Google Sheets have conditional formatting, so we could possibly color code field red or yellow based on conditions.

allella commented 4 years ago

@MarkMcDaniels this is related, but a different use case.

The LeafletJS map tool we're using for previews allows for filtering "features".

For example, when we get a universal preview Leaflet script I'll likely keep logic to discard bad Latitude and Longitude, like one of the Greenville County GeoJSON data sets we're syndicating.

The ideal case is that we do validation and filtering "upstream" on the backend, but at least for our preview maps we can do additional or redundant LeafletJS filter() validation like

    function filter(feature, layer) {
      // if a lat or lng coordinate is not defined or zero
      if ( isNaN(feature.geometry.coordinates[0]) || isNaN(feature.geometry.coordinates[1])
          || feature.geometry.coordinates[0] == 0 || feature.geometry.coordinates[1] == 0 )
      {
        return false;  // ignore the feature
      } // end if, bad lat or long

      return true; // allow the feature
}