osmlab / labuildings

Los Angeles County building import
BSD 3-Clause "New" or "Revised" License
45 stars 14 forks source link

Fields to be imported? #3

Closed bcamper closed 8 years ago

bcamper commented 9 years ago

Hi, could you let us know which of the fields in the original dataset you plan to import, and what the corresponding proposed OSM tags will be? Apologies if I missed this somewhere!

almccon commented 9 years ago

We haven't gotten that far yet! Would love some help figuring that out.

almccon commented 9 years ago

The building data looks like this:

screen shot 2014-12-11 at 5 27 29 pm

Official documentation for the building outlines: http://egis3.lacounty.gov/dataportal/2011/04/28/countywide-building-outlines/

Of those attributes, I propose we keep the height, and discard the rest. Source and date (LARIAC, 2008) will be added to the changesets.

The address data looks like this:

screen shot 2014-12-11 at 5 28 10 pm

Official documentation for the address points is here: http://egis3.lacounty.gov/dataportal/2012/06/19/la-county-address-points

I'm not an expert about OSM's addressing scheme, so any input on this part would be useful. We will need to make extensive changes to the convert.py script (I just uploaded the script from the NYC import repo)

bcamper commented 9 years ago

Amazing timing, I just remembered about and showed this issue to @meetar this afternoon. We'll take a look and chime in with any thoughts on buildings, and @missinglink or @sevko might have some input on addressing from their work on Pelias.

sevko commented 9 years ago

@almccon , the dataset should map nicely onto the addr:* tags in OSM (see the wiki). There are quite a few of them, but we mostly just use the following to filter incoming data in our addresses imports:

The LA dataset looks pretty granular, so you might want to investigate the more esoteric tags (UnitType/UnitName might be used for addr:unit, for instance), and aggregate some of the others to fit one field (Numprefix, Number, and Numsuffix to addr:number). Note that individual buildings may have multiple addresses, which perhaps @bcamper knows more about?

almccon commented 9 years ago

Here's a map of the building heights.

screen shot 2014-12-15 at 15 dec 10 11 31

Looks like Pasadena and Glendale are missing height data (the red areas in the north). I propose we ignore any height tags under 1 foot.

Also, should we convert the data from feet to meters, or leave it as-is (with appropriate units)?

bcamper commented 9 years ago

According to the wiki, the height tag should be in meters (https://wiki.openstreetmap.org/wiki/Key:building). I have seen some height data with units appended, but meters are more compatible and easier to work with (NYC building data is in meters).

Makes sense to ignore height < 1 and just leave off the height tag for those.

bcamper commented 9 years ago

Regarding building fields, it may be worth preserving an id field (BLD_ID?) to tie back to the original data. NYC import did something similar in the nycdoitt:bin field: http://www.openstreetmap.org/way/265301868

I also suggest preserving the elevation in an ele tag: https://wiki.openstreetmap.org/wiki/Key:ele. It isn't as common a use case as building height, but the import is really the time to do it if we are ever going to have this data readily available, and it would be useful for terrain + 3d modeling (we would use it in Tangram).

almccon commented 9 years ago

Regarding the building id, there's both BLD_ID and AIN. I'm not sure which one is more stable, or what they're used for. I know that the NYC import was a special case in that the city had a clear plan about how they would update their own data following changes in OSM. If LA County doesn't have such a plan, then building IDs may not be as useful. Generally there seems to be a consensus forming in OSM that IDs are not very useful in imported data. But we can finalize that decision after we've consulted with the imports committee.

For now, we should find out whether BLD_ID and/or AIN will be the same in the 2014 data. I'm tracking that possibility in issue #6. Can someone who knows more about the 2014 data (@cityhubla?) find out if which IDs will be the same?

cityhubla commented 9 years ago

The AIN should be the same, these are the legal numbers of the parcels and anything pertaining thereof, which includes the buildings. These numbers change rarely if only the owner of the property decides to the split their property into pieces (subdivide). So the data should remain the same. The parcel data, contains attributes to the type of building it is, (single story, mixed use, commercial, retail, church, school, theater, parking lots, , government buildings, etc) to which we could add. I've emailed UCLA and County for their data use disclaimer regarding issue #2

I'll shoot an email to the County regarding their plan, if the BLD_ID is kept consistent with the not yet released 2014 data.

almccon commented 9 years ago

Okay, check cdd434229553dc26615b813d0d3ca11be88ae539 for my first pass at the conversion code. I know for sure I'm not catching all of the fields that I should be. I'd love someone else to take a look at it. I'm happy with how the address points are joining with the buildings (where possible).

If you use my chunks_venice.zip sample files, and run merge.py and convert.py you'll get some .osm files to play with.

cityhubla commented 9 years ago

Regarding the Unique building IDs, the NYC import has them included, it may be best to add them as well. I could contact LA county to see if the 2014 LARIAC bldg outlines (non-public) are completely new IDs and different from the 2008 public set we're using. I'm anticipating the new set to have outlines with recently constructed buildings replacing existing ones, we could then automate this in the future by removing those IDs that are no longer there with the new ones. I'm also wondering if there is a way to retag the removed outlines for archiving.

cityhubla commented 9 years ago

Also, the data behind the elevations are off at a number of buildings, this 3D demo I made of Hollywood ThreejsQGIS demo shows the discrepancy. You will see some buildings elevated way off from a normally sloping area. At the office I work at, we use the outlines to build our 3D contextual site models for conceptual architectural work. I usually have to recenter the object axes to then intersect with the terrain in software like Rhino3d or Sketchup.

cityhubla commented 9 years ago

It may take quite a bit of time to average them out, if we script something where it takes a feature and determines whether it's ELEV is within the average of the closest surrounding buildings.

cityhubla commented 9 years ago

We may have some luck with the incoming LA County Open Data Initiative it seems the county assessor is actually going to release the roll data for each parcel for free, (more up to date than what I found from UCLA). Would this help in the code development? One attribute that would be very beneficial is the "building type" the roll data identifies the building as either a single family home, commercial, mixed, civic, school, church, etc. The text says that the data portal would be accessible on or after March 30th.

cityhubla commented 9 years ago

Here's an description on the use codes in that dataset usecodes

almccon commented 9 years ago

Oh, that does look quite interesting! I'd love to be able to use building=apartments and building=house instead of just building=yes.

http://wiki.openstreetmap.org/wiki/Tag:building%3Dhouse http://wiki.openstreetmap.org/wiki/Tag:building%3Dapartments

In fact, many of those, like churches, schools, hospitals, could also be captured.

And @lyzidiamond and @joeyklee would love Single family residence with pool, even though we probably can't use that one meaningfully for the import.

almccon commented 9 years ago

http://wiki.openstreetmap.org/wiki/Tag:building%3Dgarage would also be great for all those garages behind single-family homes

almccon commented 9 years ago

Just had a thought: does the assessor data contain information about the number of floors in the building? If so, we can populate the building:levels tag as well as the height which comes from the building footprint data.

joeyklee commented 9 years ago

For the case of LA, I don't think it is included (http://assessor.lacounty.gov/local-roll/), but I do think the "building:levels " tag would be a worth adding since it could be something derived via geotagged photos or google streetview imagery :)

In cases like the Chicago building footprint data, they include building:levels for some of their buildings.

almccon commented 9 years ago

Bummer. If it's not in the assessor data then we can't include it in the import. But there's nothing stopping people from adding it later based on ground-level imagery or surveying on-the-ground.

We can't use Google Street View for OSM mapping because of the license, but we can use Mapillary where available: http://www.mapillary.com/map/im/2FuBwfL320GgjY5amgiCgw/photo

joeyklee commented 9 years ago

That is no bueno. And definitely good point about the licensing. Always something to keep in mind ;)

cityhubla commented 9 years ago

@jschleuss and I made this google spreadsheet to figure out the fields and their OSM tag

Attributes to be imported, Google Spreadsheets We can use this spreadsheet to determine which values go or omitted.

There are two distinct uses that the 2014 Assessor's data (also referred to issue #15) has:

Jon and I propose that the these tags could be attached during the import.

For example if a building is detached single family residence its tag is as follows

OSM's taginfo shows residential as a general tag, we could consider the building tag as general. We think there should be a general tag for general designations like residential, commercial, industrial, with a subtag like building:use as whether a residential is either single family, duplex, triplex condominum.

The data in the assessor's is fine grain, we're open to tagging them differently. Thoughts?

cityhubla commented 9 years ago

@joeyklee @almccon,

The SpecificUseDetail2 of the 2014 Assesor's data has these values

We could update the script to add these to the building:levels tag

joeyklee commented 9 years ago

@cityhubla Nice find! I'm sorry I missed that. It's been awhile since I revisited the data. Better two brains than one!

cityhubla commented 9 years ago

The only issue is that this category has values like Modular, pool, vacant land, fast food etc. It would have to be sorted during scripting. Those identified as 14-20 or 6-13 would need to be omitted as the tag is number based, right?

cityhubla commented 9 years ago

Here is a breakdown of all the unique values in the use categories, Google Spreadsheet

cityhubla commented 9 years ago

I'm figuring out the markdown syntax for adding tables to the readme, We could transition the gdoc to the readme or create a git_wiki to simplify what attributes we're importing

almccon commented 9 years ago

This is great! I think we can add these tables to the README in Markdown, or else put them in the wiki on osm.org. Eventually we'll have to document everything on OSM's wiki anyway. But we shouldn't use the github wiki, since that will just confuse things.

jschleuss commented 9 years ago

@cityhubla okay. I added the 2015 Assessor data to that spreadsheet. I also started thinking about the OSM tags, starting with our thoughts above. I think we could work with OSM's tags a bit with building=house or building=school. And then go generic when we don't have more information.

We could augment the script to look for both values before making a "decision" on the tags to add. Maybe we do something similar for SpecificUseType1 and 2?

jschleuss commented 9 years ago

@almccon you know the assessor's data also has address fields, right? But we're opting for the address points because some buildings will have multiple addresses? Is that right?

almccon commented 9 years ago

No, I did not know that the assessor's data has addresses. But you're correct, if the assessor only has one address per parcel then the address points will be preferable.

cityhubla commented 8 years ago

Table has been added to the readme, I'll update the osm wiki. From the looks of it, and what Jon and I talked last Sunday, our import will be really comprehensive. Lots of great data.

cityhubla commented 8 years ago

If anyone has time, the assessor's data has values in the USE columns that could be sorted into specific OSM tags, like the building:levels tag, here is a gdoc of values to see if there are some that could be sorted otherwise the script would be tagged with building:use. @jschleuss also prepared a sheet on the gdoc to rename some values to match the ones on OSM

planemad commented 8 years ago

Just went through the discussions and it looks like we can categorize the fields into two parts:

Per this trial https://github.com/osmlab/labuildings/issues/18#issuecomment-167976124 I did notice that around 1 in 20 buildings had changed and were not good to be imported. If we tie in the address fields to the building footprints while importing, we will have to discard both the footprints and address together, but ideally we would want to import the address as a point property or add it to a newer footprint if possible.

How about we split the dataset into polygons with just the footprint and building attributes, and a point dataset of address attributes extracted from building centroids?

This will allow us to better fill data gaps in OSM rather than trying to throw all the data in at once:

  1. Import only the footprints and building attribute that match with imagery
  2. Update any existing OSM footprint geometry if needed
  3. Conflate the address points to the latest building footprint on OSM if they overlap, manually inspect addresses which did not overlap
maning commented 8 years ago

I took a stab at reviewing the Assessor categories. I think it contains a lot of information that can fine tune the building tag other than what is in the GeneralUse fields. For one, OSM doesn't have Institutional and Miscellaneous.

What I did was first compare the GeneralUse and SpecificUse and categorized according to what tag exist in OSM according to Taginfo. Then, I compared SpecificUse and Specific_1 and categorized anything that were left out.

In most cases, we will override the GeneralUse in favor of SpecificUse. For example, A feature which has GeneralUse = Commercial; SpecificUse=Department Store will be building=department_store. Or GeneralUse=Recreational; SpecificUse=Athletic and Amusement Facility; Specific_1 = Dance Hall will be building=recreational; building:use=dance_hall.

Tags we can include are building, building:use and building:levels. I don't think adding the the amenity and shop tags is appropriate since it will be included in the smaller buildings within the propery/parcel.

Pros - we adopt common OSM convention on tagging buildings. We transalte as much info as possible from the source. Cons - we lose that actual source attributes from the Assesor database.

Next Actions

@almccon @jschleuss @cityhubla @planemad @batpad

maning commented 8 years ago

PR for review here. https://github.com/osmlab/labuildings/pull/28

maning commented 8 years ago

There are cases where a different building category from Assessor data is assigned to two parts of a building.

screen shot 2016-02-09 at 14 54 06

talllguy commented 8 years ago

@maning in https://github.com/osmlab/labuildings/issues/3#issuecomment-181793071 what are the two categories*?

maning commented 8 years ago

@talllguy, the yellow = residential, purple = industrial. The area northwest is an industrial complex.

talllguy commented 8 years ago

@maning interesting. I suspect a zoning boundary caused this. Having worked for a County gov't GIS team, I wouldn't be surprised if such a method was used to tag building uses by an assessor. You might try loading the zoning boundary if you can find it on LA county's open data site to compare.

My prediction is that the zoning line will transect the building right where the split is, because zoning lines were historically mapped on small scale maps by hand, long before GIS was around. Then when GIS came around, GIS analysts were forced to digitize the zoning lines exactly where they were on the map, because changing them at all requires a law change. I digress. Check that zoning line and then if it is the culprit, you might just want to us an algorithm to decide to merge the larger piece, or ignore these zoning inferred tagging.

Also, regarding institutional, my county used that as well. In my import, I believe I changed them all to building=yes, because there was no positive way to say what they translated to. Typically they're school, college or government though.

maning commented 8 years ago

suspect a zoning boundary caused this. Having worked for a County gov't GIS team, I wouldn't be surprised if such a method was used to tag building uses by an assessor. You might try loading the zoning boundary if you can find it on LA county's open data site to compare.

You are correct we used LA County's Assessor data by joining the building shapefile and assessor csv using AIN as the join field attribute.

Also, regarding institutional, my county used that as well. In my import, I believe I changed them all to building=yes, because there was no positive way to say what they translated to. Typically they're school, college or government though.

You are correct again, building=institutional is not an OSM tag. For this, we created a lookup table by using secondary use type of the building. For example, in the assessor data, there a generaluse=building, specificuse=school we used the tag building=school for such cases. See: https://github.com/osmlab/labuildings/tree/master/mappings_csv

maning commented 8 years ago

Closing. For building parts where Assessor data assigned different classification, importer should do manual merge.