yixinsun1216 / covertoperations_manual

1 stars 0 forks source link

Match TX State Agency Parcels to Bid Notices #7

Closed tcovert closed 5 years ago

tcovert commented 6 years ago

@yixinsun1216 to flesh this out

Overall goal: we want to be able to place failed auctions on TX State Agency Land (Bureau of Prisons, School for the Blind, Parks and Rec, etc) on a map. There are two pieces to this: the bid notice data that Lydia and @yixinsun1216 have been digitizing and cleaning, and the TX State Agency Land shape file. The TX State Agency Land shape file is stored in raw_data/StateAgencyLands.

To read in the shape file, you will be using the package sf. Here is a tutorial on sf for reference, and feel free to ask Sunny any questions you have about working with spatial data.

Unfortunately, there isn't an "easy" variable in both datasets that will let us directly link them. Instead, you are going to have to do a decent amount of manual checking and linking.

Steps to matching:

  1. A big chunk of the Bid Notices are for "Public School Lands" parcels, which you definitely won't find in the state agency land shape file. For that reason, you need to go through the Bid Notice PDFs to figure out which MGLs are under the PSL header, and separate those out.
  2. We also don't need bids that do get leased (aka successful bids). Successful bids can be found using the results from the bid auctions, which have been saved in the Box folder: 'generated_data/glo_bids.Rda'. After 1 & 2 are done, you should hopefully only have a couple hundred auction lands left.
  3. In the bid notices, you are going to need to "normalize" the "Survey" column, which will be the closest thing you'll have to a matching variable with the State Agency Land shape file. There are a couple of ways to go about this; below are just a few ideas, but feel free to play around with different matching strategies

    • There are 68 unique survey names in the State Agency shape file (in the column `varSurveyN'). Many of these do appear in the notices for bids Survey column, unfortunately just written slightly differently. It might be a good starting point to create a "dictionary" that links between survey names in the notice for bids and survey names in the shape file.
    • Survey names do not uniquely identify a parcel of land, so the next identifier you can try matching on is area of the parcel. Unfortunately, sometimes part of a piece of land can be auctioned off, so all we know is that the auction land area is always less than or equal to the shape file area. But for survey names that don't occur often or have distinct areas that match up with the auction land, this could be helpful
    • After the two steps above, we might need to do some brainstorming together. For this part, you might need to dig through the comment section of the notices to figuring out how to match the rest. Perhaps the block/blockname/township columns could be useful here too (although they are mostly missing). Send Sunny a message when you get here to touch base.
yixinsun1216 commented 6 years ago

Notes to self:

Ideas for matching

ldyuan1220 commented 6 years ago

Matching:

  1. The first round of matching was done by matching part1b_cleaned3.csv, which contained all the bid notices minus Public School Lands and successful bids, to the OTLS data. One-to-one matches were found by regex matching on survey name and abstract number for each bid notice.

The one-to-one matches from this round are compiled in the csv file: matched_MGL.csv

  1. The unmatched bid notices after the first round of matching are broken down into three separate categories (each in their own csv file): multiple_abs_num_match.csv - Multiple OTLS abstract number matches no_SURNAM_match.xlsx - No OTLS survey name match missing_abs_num.csv - Bid notice had a missing abstract number associated with the survey name

  2. The second round of matching for the three unmatched files was manual. For each one I went back to the original bid notice and matched based on more specific information like county (using the package mapview to view the county and the bid notices within the county), part/tract, or area to both OTLS data or State Agency Land data.

Final matches for the three unmatched files are: multabs_matched.csv no_surnam_matched_survey.csv and no_surnam_matched_SAL.csv (2 csv files depending on whether the bid notice was matched with OTLS or State Agency Land) miss_abs_matched_survey.csv and miss_abs_matched_SAL.csv