uchicago-computation-workshop / ma_proposal_workshop_a1

0 stars 1 forks source link

Extension: Hwang and Sampson (2014) #9

Open tonofshell opened 5 years ago

tonofshell commented 5 years ago

From its inception, America’s history has been steeped in atrocious racial injustices. The Eurocentric superiority complex of white settlers and the need for free labor by the capitalist ventures of the European bourgeoisie led to a nation built on the exploitation of those deemed by whites to be ‘other’. Yet, despite the continued dismantling of de jure segregation, Americans across the country remain separated spatially by de facto means. Of course, segregation and the inequality it perpetuates are heavily studied subjects in the social sciences, in part due to the great injustices it contributes to. As such, there is a large corpus of literature linking segregation, usually racial segregation, to a comprehensive set of increasingly intricate social science verbosities, in categories such as health, education, income, political affiliation, and anything else that researchers can think of.

Typically, studies on segregation will also touch on gentrification, and its causes and effects. Segregation and gentrification oftentimes both work hand in hand to exacerbate inequality in our communities. While working to improve and ‘clean up’ impoverished neighborhoods can bring short term improvements for its inhabitants, there is a delicate balance before gentrification takes hold and pushes the original community members out, usually due to increasing housing costs as demand increases. These are incredibly difficult problems that society must work through, which many times, as in much of Chicago, results in very little success in usurping these interactions between racial and capitalist forces.

Yet, despite these challenges, I believe there to be a lot of potential for technology and large-scale data gathering to help us overcome these tough issues, and mitigate these injustices. Hwang and Sampson (2014), I believe, have merely discovered the tip of the iceberg in using technology to measure and track gentrification, segregation, and their effects. The most interesting research innovation involved using Google Street View to "to systematically detect the visible character and degree of gentrification” in neighborhoods, remotely (Hwang and Sampson 2014:732). While this certainly saves a lot of time compared to the traditional alternative, physically going to locations in neighborhoods of interest and taking photos of the area for analysis, I feel that it can still be drastically improved with computational methods. For instance, Hwang and Sampson use human labor to find, analyze, and categorize Street View data. Locations on Street View were selected and then given to trained human raters to pick out features in the image that are used to classify where that location falls on a gentrification scale. While using Street View is a notable efficiency improvement, using human labor is still significantly non-exhaustive, and cost prohibitive.

An alternative to using humans to categorize images, is letting computers do the task. Image recognition algorithms have come a long way since 2014, becoming much more powerful and easy to use. Coupling this technology with web scraping and categorization through machine learning, it seems completely feasible to automate a large portion if not all of this process of measuring gentrification through Street View data. Hwang and Sampson state that the time intensity of this analysis using human labor was a constricting factor in their study, limiting them to merely a representative sample of Chicago. Automating this entire process could allow deeper analysis of a single city, or a wider scope of several cities. Another advantage is that training models to simply pick out features in Street View data and count them geospatially, would likely remove a lot of the bias that using humans can introduce. Even though the raters in the study by Hwang and Simpson were trained and tested on a sample set before they were allowed to categorize new images, these humans still have numerous implicit biases that would be very hard to detect and account for. An automated computer processing system, while not entirely devoid of bias, as any computer system is designed by humans after all, would still likely reduce bias, and with careful sampling and validation measures, allow any bias that does appear to be accounted for.

However, Hwang and Sampson use numerous other datasets coupled with this Street View data, including topical census, survey, and geospatial data from a variety of other sources including national and local governments. While these different sources cover a wide range of facets for analysis, there remains a few problems with the data. For one, surveys and government gathered demographic data like used by Hwang and Sampson are gathered at very specific set periods in time. While this is less of a problem when measuring gentrification and segregation over periods of time in the scale of decades or centuries, even when looking at data collected yearly or monthly, there is a lot of generalization and extrapolation that happens between data points. This makes it difficult to measure changes in the recent past much less in real-time due to the considerable latency between the start of data gathering and its publishing by these organizations. It also fails to capture a certain precision in a world that changes and evolves every single day. A lot can happen in Chicago or any city in a month, much less an entire year or longer, and while these datasets gathered at further intervals can capture overall trends, they cannot capture much more than that.

In contrast, many organizations today publish practically real-time data that can be used for the same purposes as the more traditional datasets used by Hwang and Sampson. Point of interest data in mapping tools such as Google Maps or OpenMaps have up-to-date data on businesses, their services, and how many people visit them. Zillow, Apartments.com, or AirBnb can all be used to measure real estate costs and conditions. Even the flow of people between neighborhoods can be measured much more precisely with data from ride shares, bike shares, and real-time bus tracking data. With this level of high precision data, we can not only perform analyses with much more precision, as ‘always on’ datasets, we can capture natural experiments and measure their effect directly. For example, looking at the effects of a Whole Foods being built in Englewood, or the even more recent construction of a Jewel-Osco in Hyde Park. It also allows us to answer how one-time events like festivals or concerts may affect the flow of people, money, and goods in and out of these segregated areas. Can the effects of consistent events accumulate over time and combat segregation or inequality?

There are whole new worlds of measurements that can be captured through these data sources and careful computational analysis, which would contribute to the greater empowerment of public policy in the studied areas. I think particularly interesting and important are the possibility of these small meta-analyses of events that I believe contribute to business growth and changing attitudes of neighborhoods, resulting in better opportunities for its inhabitants, without detracting from the culture of these neighborhoods, as many critics of gentrification have shown to occur. If this relationship is indeed true, computational analyses could be a powerful tool in aiding a new perspective on gentrification in urban areas.