uWaterloo / OpenData

Help and Support for University of Waterloo Open Data Initiative
https://api.uwaterloo.ca
90 stars 12 forks source link

Merge Building data #55

Closed KartikTalwar closed 11 years ago

KartikTalwar commented 11 years ago

Goal

To have a single file with complete building data that will empower api v2

Final Fields

BuildingID
BuildingCode
BuildingName
AltBuildingNames
Latitude
Longitude

Building data from v1

https://github.com/uWaterloo/Datasets/blob/master/Buildings/buildings.csv

Building codes

https://github.com/uWaterloo/Datasets/blob/master/BuildingCodes/BuildingCodes.csv

Related issues

Thoughts? @cnvandev, @kraigu, @amsardesai, @axyjo, @camdavidsonpilon, @krisolafson, @gdmalet

axyjo commented 11 years ago

Just a few thoughts on improvements for the next version (might span multiple comments, people with edit access: feel free to split/merge them as necessary):

Is it necessary to have a single file with all the information? I imagine having only one file might become unwieldy to maintain going forward. Would having multiple files, all containing the same, consistent 'primary key' be a suitable alternative?

Is there a reason for splitting Building Name from Alternate Building Name? I was hoping that it'd be possible to enumerate all building names, and then mark one as primary/official or something like that.

Are the geo-points the centroids of the building? Could we have bounding polygons instead? I think that makes more sense in terms of buildings. That way, we can standardize on using Lat/Lng for locations across the API and then locating them within buildings becomes trivial once we have the bounding polygons as well (makes #54 simpler to execute as well). One problem with this might be overhead bridges -- which building do they belong to? I think plant-ops assigns them to a particular building on their floor plans, but I haven't verified that.

axyjo commented 11 years ago

Where does building_id come from for current buildings? Does it come from plant ops? If plant ops doesn't assign a building ID to sub-buildings, maybe consider switching to the building code as the 'primary key'? Do we use building_id in other datasets to reference location?

KartikTalwar commented 11 years ago

Is it necessary to have a single file with all the information? I imagine having only one file might become unwieldy to maintain going forward. Would having multiple files, all containing the same, consistent 'primary key' be a suitable alternative?

Well, considering the buildings list won't really be updated frequently, having the list of 200 items is not that bad. Now, this doesn't reflect how it would appear on the API, this is just the data, you very well will be able to get information in /buildings/[:name].json format.

Is there a reason for splitting Building Name from Alternate Building Name? I was hoping that it'd be possible to enumerate all building names, and then mark one as primary/official or something like that.

Hmm, that could work too but my intent was to identify buildings by their acronym first, which works out nicely for the query schema mentioned above and keeps the key structure the same across /buildings.json and /buildings/abc.json (having an alternative_name key present on both)

Are the geo-points the centroids of the building? Could we have bounding polygons instead? I think that makes more sense in terms of buildings. That way, we can standardize on using Lat/Lng for locations across the API and then locating them within buildings becomes trivial once we have the bounding polygons as well (makes #54 simpler to execute as well). One problem with this might be overhead bridges -- which building do they belong to? I think plant-ops assigns them to a particular building on their floor plans, but I haven't verified that.

I don't believe all points are centroids, and we do have polygonal coords for most buildings but this I believe was a crowd-sourced effort. It would be nice to have them present for all of them.. (the ones that are available will be part of building)

Where does buildingid come from for current buildings? Does it come from plant ops? If plant ops doesn't assign a building ID to sub-buildings, maybe consider switching to the building code as the 'primary key'? Do we use building_id in other datasets to reference location?

The id comes from here. It's not currently being used anywhere since it was a product of #47. Seems like the id gets the be the same as parent.

axyjo commented 11 years ago

Well, considering the buildings list won't really be updated frequently, having the list of 200 items is not that bad.

That makes sense. Thanks!

Hmm, that could work too but my intent was to identify buildings by their acronym first, which works out nicely for the query schema mentioned above and keeps the key structure the same across /buildings.json and /buildings/abc.json (having an alternative_name key present on both)

Not quite sure what you mean by this. Wouldn't the acronym be BuildingCode and not BuildingName?

I don't believe all points are centroids, and we do have polygonal coords for most buildings but this I believe was a crowd-sourced effort. It would be nice to have them present for all of them.. (the ones that are available will be part of building)

Was this the effort lead by Google? If not, can we use the data from Google? Do we have a right to use that data since it's about our institution, even though they own the data?

KartikTalwar commented 11 years ago

Not quite sure what you mean by this. Wouldn't the acronym be BuildingCode and not BuildingName?

Acronym here is the building acronym (SLC, MC etc). I think it's just our terminology that is getting us confused.

Was this the effort lead by Google? If not, can we use the data from Google? Do we have a right to use that data since it's about our institution, even though they own the data?

No, a student did it by himself (as far as I know) and donated the file to us. What data by Google might you be referring to? Map maker? I don't know what the status of that is.. (don't think its the same as getting polygon coords though..) Maybe @krisolafson knows something about this

krisolafson commented 11 years ago

AFAIK the only crowdsourced effort was to get coords for the room doorways and that dataset wasn't completed. The Faculty of Environment created polys for their buildings, but none of the other buildings on campus were done to my knowledge.

axyjo commented 11 years ago

The Faculty of Environment created polys for their buildings, but none of the other buildings on campus were done to my knowledge.

Do you know how much effort that took by any chance? Would it be feasible to do by volunteers over a relatively short period?

axyjo commented 11 years ago

Also, would it be possible to get in touch with IAP to get more data on the buildings (one possibly valuable stat that they probably have is square-footage per building)?

KartikTalwar commented 11 years ago

We can definitely get there but I'd personally like to start with just having the building names (and possibly shape coords) up on v2 => get this merge started

axyjo commented 11 years ago

:+1: on that. Don't want to introduce scope-creep into this issue. My bad.

KartikTalwar commented 11 years ago

No worries. I'll start merging those csv files manually tonight and we go on from there.

krisolafson commented 11 years ago

Yeah we will work with IAP to see how we can improve this data in the future; I know they have lots of good stuff.

kraigu commented 11 years ago

@KartikTalwar - re "sub-buildings", if you assign them the same BuildingID then you lose the ability to use that ID as the primary key, no? What if you did

"ID","BuildingName" "100","IST Special Building" "100-1","IST Special Building - Shed For Co-Ops"

KartikTalwar commented 11 years ago

@kraigu let me get back to you tonight on this. I'll demo the current implementation. I'm thinking of using the building acronym as the identifier

KartikTalwar commented 11 years ago

Here, when you request a list of all the buildings, you get all the fields except the outline coordinates. You can then query each building by it's code to get the same data, plus the coordinates.

Should we make building_sections not show up on the list and be part of individual building query?

/buildings/list.json

{
   "data":[
      {
         "building_id":2,
         "building_code":"E2",
         "building_name":"Engineering 2",
         "alternate_names":[],
         "latitude":79.9875845,
         "longitude":-80.13445,
         "building_sections":[]
      },
      {
         "building_id":20,
         "building_code":"V1",
         "building_name":"Student Village 1",
         "alternate_names":[
            "Village 1"
         ],
         "latitude":79.98758945,
         "longitude":-80.13445,
         "building_sections":[
            {
               "section_name":"V1 North",
               "latitude":79.000001,
               "longitude":-80.000001
            }
         ]
      }
   ]
}

/buildings/V1.json

{
  "data":{
      "building_id":20,
      "building_code":"V1",
      "building_name":"Student Village 1",
      "alternate_names":[
         "Village 1"
      ],
      "latitude":79.98758945,
      "longitude":-80.13445,
      "building_sections":[
         {
            "section_name":"V1 North",
            "latitude":79.000001,
            "longitude":-80.000001
         }
      ],
      "building_outline":[
         {
            "latitude":79.000001,
            "longitude":-80.000001
         },
         {
            "latitude":79.000001,
            "longitude":-80.000001
         }
      ]
   }
}
KartikTalwar commented 11 years ago

Missing building_id for following places (Buildings.csv)

AAC, Architecture Annex Cambridge
AAR, Architecture Annex Rome
ACW, Accelerator Centre Waterloo
IQC, Institute for Quantum Computing
WFF, Warrior Football Field

I don't believe ACW/WFF are considered UW..