Closed andrewfoote closed 4 years ago
Adding some language to discuss large clusters in the paper, at end of section 2:
There are some notably large clusters in our replication, mostly in minimally dense areas (see Montana and Eastern Oregon with Northern California). It is clearly unlikely that individuals commute within these large clusters, but this fact serves to underline the main weakness of the CZ methodology - counties are forced to merge together eventually, even if the links between them are incredibly weak.
And a proposed response to referee:
We have added some discussion of the differences between the replication and the original TS1990 clusters in the paper. In particular, we note that the large clusters (in terms of land area) are in areas with very sparse populations. Additionally, as we note earlier in the paper, TS separated the county into 6 overlapping regions and ran the algorithm separately on all of them, manually addressing discrepancies. This re-normalized the dissimilarity matrix for each region, which causes the zones to in the Northeast to have different connectivity than zones in the South. We would also point out that their zones still have a number of large zones, particularly in Nevada, Idaho, California.
Another approach, to address the final piece of the comment: "how many people actually commute within those large zones?"
Task: simple appendix table that lists the absolute and relative counts of commutes (and population) for each of the computed CZ. Then we can refer to that. There are two ways to understand the comment: how many people commute, as a fraction of the zone (which is a high percentage by definition, and might indicate a lack of understanding), and a question of how many people simply "commute" as a count (sparsely populated areas won't have many commuters in the absolute, but relative to the population, that'll still be large). We skirt the need to interpret by just providing both sets of numbers.
@larsvilhuber @mkutzbach - why not just provide this as a table to the referee response?
@andrewfoote Do we still want to provide the tables? (@mkutzbach ) I propose for now we skip this.
@larsvilhuber I support skipping this- I don't think it is a big part of the critique.
3) The authors replicate the TS1990 commuting zones and demonstrate the ambiguity over the choice of cutoff height. This method relies on the dissimilarity matrix that decides the cutoff value merely based on the mathematical distance between clusters. As shown in Figure 1, there are some very large commuting zones in the replication, such as the area over Arizona and West New Mexico in pink and the area over North California and Oregon. How many people actually commute within those large zones? Note that those areas are not in the same commuting zone in TS1990. You might want to discuss these and other differences in your replication to convince readers of the validity of your method.