typpo / ca-property-tax

CA property tax visualization
https://www.officialdata.org/ca-property-tax/
GNU Affero General Public License v3.0
90 stars 17 forks source link

Scrape & Parse Placer County Files #17

Closed jamesshannon closed 4 years ago

jamesshannon commented 4 years ago

Initial investigation for Placer:

GIS data overview page

Parcel information is in AddressPoints.csv. You'd think you'd want Parcels.csv but AddressPoints has a lat/lng centroid point while Parcels has some fields (like Shape__Area) which don't appear directly useful.

Tax info can be found at the URL: https://common3.mptsweb.com/MBC/placer/tax/main/__APN__/2020/0000 where __APN__ is the APN without -'s. E.g. https://common3.mptsweb.com/MBC/placer/tax/main/466120044000/2020/0000. Tax amount is found in the Totals - 1st and 2nd Installments section.

Originally posted by @jamesshannon in https://github.com/typpo/ca-property-tax/issues/1#issuecomment-720307958

jamesshannon commented 4 years ago

@typpo wrote:

@jamesshannon The Placer system looks identical to Yolo County's, which means much of the code can be reused! https://github.com/typpo/ca-property-tax/tree/master/scrapers/yolo

jamesshannon commented 4 years ago

@typpo Thanks. That's helpful.

Where'd the file Yolo_County_Tax_Parcels_Open_Data.csv come from? I see in the README that there's a way to convert the gdb to geojson (which is used in the parser), but what about the CSV used in the scraper?

I ask because I've investigated the Placer AddressPoints and Parcels files a bit more. AddressPoints has APN and centroid, but I've found that:

So it seems better to use the Parcels file, but the CSV version doesn't have any useful-looking geodata. Both Parcels and AddressPoints have an object_id, but they don't seem to match. So I've started looking at ways to get geodata from non-CSV versions of the Parcels file. It appears I can download the shapefile and use a python package to get the shape and then shapely.geometry to find the centroid?

typpo commented 4 years ago

I should have clarified - I think the input CSV for Yolo is different from Placer. It's just the tax system that appears to be the same, meaning I think we should be able to copy parts of the web scrape and parse steps (but not the same input file format).

I think that the Parcels file is the way to go. Although the spreadsheet doesn't have latlng info, if you download it as a shapefile and then convert it using ogr2ogr, it will include latlng info.

After downloading and unzipping the shapefiles, this command:

ogr2ogr -f GeoJSON placer.geojson Parcels.shp

Yields placer.geojson. Here's an example record from the file:

{ "type": "Feature", "properties": { "OBJECTID": 5, "APN": "471-340-027-000
", "TAX_DESC": "NORMAL OWNERSHIP", "USE_CD_N": "APARTMENTS, 4 UNITS OR MORE
", "STR_SQFT": 1076, "ADR1": "5043 MILLSTONE WAY", "ADR2": "GRANITE BAY CA
95746", "CITY": "GRANITE BAY", "STATE":
 "CA", "ZIP": "95746", "STREETNUM": "720", "STREETNAME": "SUNRISE", "STREETTYPE": "AV", "LANDVALUE": 9695, "STRUCTURE": 123898, "Shape__Are": 1051.560546875, "Shape__Len": 155.441730291059 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -121.272353678201995, 38.735262075994498 ], [ -121.272330313062994, 38.735261915944498 ], [ -121.272330259867999, 38.735267119240397 ], [ -121.272258341183004, 38.7352666546535 ], [ -121.272258460228002, 38.735253323802198 ], [ -121.272278634988993, 38.735253450342597 ], [ -121.272278881337996, 38.735232842775901 ], [ -121.272271623899996, 38.735232797256103 ], [ -121.272271802228005, 38.735194767554198 ], [ -121.272395469203005, 38.735195130536603 ], [ -121.272402235800001, 38.735195145943401 ], [ -121.272402182394998, 38.7352328890397 ], [ -121.272425318672006, 38.735233157550503 ], [ -121.27242491394, 38.735267575904402 ], [ -121.272353624586003, 38.735267320729399 ], [ -121.272353678201995, 38.735262075994498 ] ] ] } },

The list of latlngs defines a bounding box for the property, and we take the centroid. Many of the scrapers/parsers load an ogr-generated geojson file. Here's an example of loading the geojson file and here's an example of finding the centroid.

If you'd like to take this on, I'm happy to answer any other questions and support you! I've uploaded the converted Placer Parcels geojson here so you don't have to go through the trouble of installing ogr yourself: https://drive.google.com/file/d/1t7DpysdWdtJAry1lE4gesjzkuZs4t9n0/view?usp=sharing

jamesshannon commented 4 years ago

Placer CSV file: xxxxx

I'm ready to upload the Placer script, but not sure how to isolate it from the sharedlib changes which I have merged into the branch for development. It'll probably work itself out after the sharelib branch is merged.

jamesshannon commented 4 years ago

Hold off on that file... I'm validating it and seeing some issues.

jamesshannon commented 4 years ago

Ok. File is correct now: https://drive.google.com/file/d/1QU5k5Il6GbzVT4r1NaGPPldgkqJDU495/view?usp=sharing

I created a quick script to validate the files. It does two things to check for programming errors and GIGO errors:

typpo commented 4 years ago

Added! Sorry for the delay, the past week has been...distracting.

The validation script sounds very useful, I often mess things up the first time by flipping lng/lat

typpo commented 4 years ago

@jamesshannon How would you like to be credited on the site? Name + link to twitter or personal website?