Review Ticket: Gravity Models of Migration

amsichani commented 6 years ago

The Programming Historian has received the following tutorial on 'Gravity Models of Migration' by @acrymble . This lesson is now under review and can be read at:

http://programminghistorian.github.io/ph-submissions/lessons/gravity-model

Please feel free to use the line numbers provided on the preview if that helps with anchoring your comments, although you can structure your review as you see fit.

I will act as editor for the review process. My role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum. I have already read through the lesson and provided feedback, to which the author has responded.

Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.

I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me. You can always turn to @amandavisconti if you feel there's a need for an ombudsperson to step in.

Anti-Harassment Policy

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

The Programming Historian is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutinize ideas, to ask questions, make suggestions, or to requests for clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. If anyone witnesses or feels they have been the victim of the above described activity, please contact our ombudspeople (Ian Milligan and Amanda Visconti - http://programminghistorian.org/project-team). Thank you for helping us to create a safe space.

amsichani commented 6 years ago

NB. couple of layout / technical issues with its rendering:

Table 3 does not fit horizontally on the page and is cut off. This is confusing for the reader. I suspect it is a CSS issue
wide formulas that use the double-dollar-sign formatting codes are cut off horizontally. I suspect it is a CSS issue. See paras 125, 130, 131

amsichani commented 6 years ago

Sorry @acrymble for the radio silence here and thanks for your patience -- Quick update on this lesson's review process: I am now working on finding two reviewers for this demanding lesson (deadline 7th Dec) - @acrymble and @programminghistorian/english-team please do come with suggestions if you have any. I will also post some initial feedback on this lesson earlier than the 5th December and try to help out with minor layout issues. An optimistic plan will be then to have the reviewers' initial comments back by the 7th January and will take it from there.

acrymble commented 6 years ago

Thanks for the timeline @amsichani. I'm a bit wary of suggesting reviewers lest it look like the review hasn't been arms length. Google Scholar is a useful way of finding people who have published in the area. You might look for people who publish on spatial interaction models or gravity models.

amsichani commented 5 years ago

one reviewer will be @oliverdw - given the holidays season, our goal is the 2nd week of January 2019. Working on the second reviewer, more soon -

acrymble commented 5 years ago

@amsichani we're nearing the holiday season so I wanted to follow up before everyone gets busy to see if there is any initial feedback before the reviewers start their work.

oliverdw commented 5 years ago

Hi Adam,

I'm planning on doing my first draft next week. Looks good so far; Waldo Tobler's work reviewing Ravenstein a century on might be something to add

Oli

On 12/12/2018 13:30, Adam Crymble wrote:

@amsichani we're nearing the holiday season so I wanted to follow up before everyone gets busy to see if there is any initial feedback before the reviewers start their work.

-- Dr Oliver Duke-Williams, Department of Information Studies, UCL tel: 020 7679 7205 twitter: @oliver_dw

amsichani commented 5 years ago

thanks @oliverdw for this- in general, I would also add that Tobler's contributions on the theoretical and mathematical foundations of (digital) analytical cartography & GIS would be an interesting addition in terms of bibliography.

said that, I will also add my first comments this weekend- overall looks good! Still looking out for the second reviewer - hopefully will have some good news early next week.

amsichani commented 5 years ago

Thanks so much @acrymble for writing this lesson - happy to see ProgHist covering these topics. I just finished an initial pass through the entire text - overall it looks good to me. As this is one of our most difficult, specialised topics of our collection, I think it’s important to ensure that this lesson is accessible and easy to follow for everyone (even for non-experts).

Just a reminder about our open review process: While we wait for our reviewers to submit their comments, this ticket will also be open to any other feedback from our community. Once we have received both reviewers' responses, the open review period will then be closed, so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred. I will summarize the feedback from both reviewers as well as any other input from our wider community, and work with @ to decide what further editing the lesson will need and we'd be ready to publish.

While waiting comments from our peer reviewers, I d like to focus mainly on (minor) functional and layout revisions at this point and on a couple of structuring suggestions, that are not affecting the actual review process. It’s entirely up to you if you want to work on these now or at a later stage, once all reviews are submitted:

[ ] modelling - modeling : adopt a single writing throughout the lesson for consistency reasons
[ ] p.16 MASS : omit the hyperlink on the first instance of the word.
[ ] p.24 Vagrancy Act/ system: a bit of historical context will be welcomed here
[ ] p.33 caption fig.4 sentence case instead of Title Case
[ ] p.34 amend the list formatting
[ ] p.41 I ‘d suggest to omit this list of active links here (and the ref) and instead place the links within each of the following sections.
[ ] p.68-70 amend the list formatting
[ ] p.86 Wa = Wα // WaΤ = WαΤ : adopt a single writing of the variable throughout the lesson for consistency reasons
[ ] p.92 λ element should be defined in the main text
[ ] I suspect too that the Table 3 layout issue as well as the display of the wide formulas on p.125, 130, 131 might be solved with small amendments on CSS table margins/borders I 'd be hesitant though to make changes on the main CSS file as this would affect other lessons too. So probably we should find a way to amend the markdown locally - I ll have a couple of trials and hopefully I' ll come back with a functional snippet.

acrymble commented 5 years ago

Thanks, I'll wait to hear from the other reviewers now since I know at least one of them has started reading.

oliverdw commented 5 years ago

This is an interesting tutorial, and introduces some relatively complex procedures. As the author states (para 11) the examples of application of spatial interaction or gravity models to historical data are very limited. With specific reference to migration data I think it would be useful to refer directly to Ravenstein’s 1885 paper:

Ravenstein, E. G. (1885). The Laws of Migration. Journal of the Statistical Society of London, 48(2), 167-235.

This codified a set of ‘Laws of Migration’ which, whilst not explicitly proposing a gravity model, identify some key elements: that migration is affected by distance and by population.

Following on from this, it’s also useful to refer to Waldo Tobler. Tobler’s first law of geography – that “everything is related to everything else, but near things are more related than distant things” – also effectively describes the gravity model (without specifically referring to mass, however)

Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic geography, 46(sup1), 234-240.

Tobler revisits the work of Ravenstein (and other geographers) in a 1995 paper:

Tobler, W. (1995). Migration: Ravenstein, Thornthwaite, and beyond. Urban Geography, 16(4), 327-343. https://doi.org/10.2747/0272-3638.16.4.327

This then, is the opposite of the author’s query: rather than applying gravity models etc to historical data, it’s a re-assessment of historical propositions using more recent data. Attention is drawn to one of Ravenstein’s original figures, with Tobler wryly noting that as it was un-mentioned in the text, it would probably not survive the modern editorial process.

The tutorial covers quite a bit of ground; it discusses gravity models having introduced some of the concepts by way of linear and multivariate regression, and is – I think – one of the more complex PH tutorials. There’s a difficult line to be followed here in terms of target audience: those familiar with gravity models (in concept, at least, if not in terms of implementation) will probably find the explanation of linear regression un-necessary, but for those coming to this fresh and who need the base concepts explained may find this a quite steep progression.

That said, the text is well written and easy to follow. I think that the staged approach of previewing a result and then explaining how it was reached works well. For those unfamiliar with models of this type, I think that a potential source of confusion is the shift from the generalised model at para 63 to the specific one at para 65. It is explained earlier that µ is a population, but no hint is given at this stage about the specific purposes of the subscripts: that µij is the population interaction between origin i and destination j, i.e. the count of migrants. This is covered later on (j is described as referring to London, although it can be understood more generally than that as any modelled destination [edit - I think that the model here however is a regression for one destination only, and not a more complex doubly-constrained model that work work for a set of origins and destinations] ). Although it risks obvious repetition, I suspect that the relevance of the subscripts should be highlighted.

The section on setting beta values also covers some quite technical areas, and is largely presented as a black box. I think a couple of aspects need comment. The need for complete data is stressed in Para 80, whilst in Para 119 it is mentioned that we need to tell the glm library what to do in the event of there being no data. Is na.action a mandatory argument? If the sample data set does not have any missing data, then it might be worth noting to readers that we do not need to worry about no data. More generally, I think a little more explanation might help that in this section the beta parameters are being determined across the whole data set from known data, and we can then compare individual origin-specific observations with this general model. We can then examine these and identify over/under predicted flows, as has been done. These might elicit an interpretation back on knowledge of the phenomena (although it’s also possible that mis-predictions can identify cases where the input data is flawed).

The final section encourages readers to try the methodology on their own data. I suspect that this will be quite daunting, not least because of the implied data requirements. The main worked example uses five parameters, and the co-example related to coffee also has five suggested independent variables. The text implies that this is necessary, but in fact the number of parameters is not fixed: we really only need something about the origins (typically a population) and something about the interaction between the origins and the destination (typically a distance, although this might also be expressed in terms of cost or time rather than physical distance). Additional terms might build a better model, but they’re not mandatory. Returning to the start, I think one reason that historical applications of gravity models has been limited is that they require quite a lot of data, which is not always directly available. The suggestion that multiple independent variables are necessary may make readers thank ‘well, I haven’t got that many observations, so this is not for me…’. I’d adjust this section to make it clear that the number of variables can be changed.

In terms of the tutorial itself, I think it’s a good idea to defer to existing resources relating to R, although this might need some additional support. For example, in para 114, the reader is advised to use a text editor. I’d add a note that they can also directly create a new script if they’re using R Studio. It’s the same thing, of course, but people may prefer the sense of keeping everything contained within one application. In the example code: • my setup required me to use: install.packages(“MASS”) rather than – as shown - install.packages(MASS) (i.e. quoting the package name) • I don’t think the command numbering #1, #2 etc is necessary • Depending on where they’ve placed files etc, some people will need to change the working directory in order for the read.csv() command to work. Again, this should be covered in basic R tutorials, but could be included here as well (eg in RStudio – Session > Set working directory > Choose directory)

I agree that it is a good idea to suggest that people use a calculator to work out a result by hand. It’s possible however to keep the next stage in R, by doing something like:

gravityModelData$pred = exp(-3.848 + (1.235 log(gravityModelData$population)) + (-0.542 log(gravityModelData$distance)) + (-0.024 gravityModelData$wheat) + (-0.025 gravityModelData$wages) + (-0.014 * gravityModelData$wageTrajectory) )

or, to keep the full precision of the parameters (and avoid having to re-type them):

gravityModelData$pred2 = exp(gravityModel$coefficients[1] + (gravityModel$coefficients[2] log(gravityModelData$population)) + (gravityModel$coefficients[3] log(gravityModelData$distance)) + (gravityModel$coefficients[4] gravityModelData$wheat) + (gravityModel$coefficients[5] gravityModelData$wages) + (gravityModel$coefficients[6] * gravityModelData$wageTrajectory) )

In this example, pred and pred2 have slightly different results, due to the rounding of the parameters in the original. Note that R’s log() wil give the same answer that ln() will give on a calculator.

Some specific proofing notes: Para 28 – the dataset page identifies that there is also a newer version; does the link need updating? Para 39 – The equation needs correcting, as the parentheses do not balance; this is also true of the equation at para 46, which, I think, should be: y=β0+β1(x1)+β2(x2)+...+βp(xp) (or perhaps y=β0+(β1(x1)+β2(x2)+...+βp(xp)) ) i.e. some additional opening parentheses have crept in The para 39 equation definitely needs a closing bracket after the last term (so that exp() applies to the whole of the right hand side); the separate terms could be placed in their own parentheses – as the last three terms are – if this makes it more readable, but the first two terms have omitted their closing brackets. NB – the equation is restated at para 92 without the per-term parentheses; I’d be happy to use that as the canonical layout, and to suggest that the other uses of the model adopt the same layout. Para 40 – typo, ‘Congen’ should be ‘Congdon’ Para 44 – perhaps worth stating, simply because of familiarity, that linear regression can also be carried out in proprietary statistics software such as SPSS, and in tools such as Excel? Table 3 – the Dij distance measures are quoted to far too many decimal places and have spurious accuracy. 8DP (as shown) is suggesting sub millimetre level accuracy! No more than 1DP needs to shown in the table. Similarly, the wage trajectory doesn’t need to be shown to such a large number of places. Para 89 – repeat of the word ‘final’ Para 105 – in the definition of ‘standard deviation’ I’d avoid the word ‘variance’ which has its own specific meaning, and use ‘variation’ or ‘dispersal’. Para 124 – giving descriptive terms for the components makes the equation overflow the page

amsichani commented 5 years ago

Many thanks @oliverdw for your detailed review comments! Happy to inform that @sferna109 will act as second reviewer for this lesson. I 'd propose to wait for her review in the next couple of weeks and then I will summarise all the reviewers' comments so @acrymble can work on the lesson.

amsichani commented 5 years ago

Quick update on the timeline : @sferna109 will contribute with her review by the first days of February due to her heavy schedule . I ll keep an eye on it, so I could speed up with summarising the comments once they are in. Thanks everyone for your patience and contributions!

sferna109 commented 5 years ago

This tutorial is very useful when working with specific migration data. The detail explanation of solving the mathematical equations are very well explained. Meanwhile, it is important to consider that this tutorial is very specific to certain migrations groups in Europe. That is, that the data obtained from London is, in fact, reliable, therefore it was confident to map a certain area of London. But it is important to consider that in some cases or countries the migration and migrants data is not well documented. An example of this is the records of the undocumented migrants in the United States or the people who go back and forth from one country to another. I recommend mentioning this as a endnote or in a few sentences since it is important for others to be aware of this issue.

Another thing to consider is that the author is approaching the mathematical solutions putting into context two examples. But, from a more ethical perspective, I recommend that in this paragraph it is important to differentiate a study with people and a study of goods, preventing the issue of converting people into objects. Clarifying this will be very helpful.

P. 16 It is necessary to include the tutorial to install R. P. 30 It is until this point where it mentions the issues of the data found in the primary courses. It is a good description but still need some clarification at the beginning for an audience that will follow these methods using other migration cases.

P. 64 It will be good to include the link of the Economic History Review article. P. 72 I suggest providing other solution in case one of these variables does not work. Provide an alternative solution will be good.

P.77 After paragraph 76 it seems confusing because the example is towards the coffee study. I suggest providing examples of the migration case or announce that you will now provide examples of another case.

P. 88 Not sure if this link of the csv file supposes to take you somewhere but if it does, it doesn’t work. P. 98 The 1 is not visible because of the paragraph symbol. P. 103 It will be necessary to provide information on which calculator model will be good to consider.

P. 112 Until this point the tutorial to install R is provided. It is good but also it will be good to mention it since the beginning.

P. 113 Same here, not sure if the link of the .csv file takes you somewhere, if it is, it is not working.

P. 138 It will be important to make the conclusion broader and not only focus in London, in order to contemplate this tutorial for other countries case studies.

At the end in the suggested articles, it will be also important to recommend projects that are working with migration data, such as Torn Apart / Separados.

Overall, the tutorial is very good a specific case study. The example of the coffee is a great way to approach the mathematical solution in other contexts. I think that this tutorial will not only help historians but also sociologists and political science scholars, among others. By considering this historical problem and mathematical approach in migration studies there can be done some contrast within other regions or countries. I am eager to work with this tutorial and see how will this help in a cross-border migration case study at the US-Mexico border from the nineteenth century to the present.

In another note, sorry for the delay. I hope these suggestions help. I look forward to discussing any of my recommendations, if there is the case.

amsichani commented 5 years ago

Many thanks @sferna109 for your review comments. 👏 I appreciate both yours and @oliverdw 's thorough and detailed reviews. @acrymble I will now take a look over both their comments and summarize e a list of revisions for the lesson. Please give me about a week to do this before you start incorporating any changes. I'll try post them here by Tuesday 12th Feb (if not earlier).

acrymble commented 5 years ago

Can you also please let me know what you want to do about the length? This far exceeds the maximum word length.

amsichani commented 5 years ago

As both @oliverdw and @sferna109 noted, although this is one of our more complex lessons, it is really well written as it succeeds to make accessible difficult concepts and procedures and makes clear how gravity models are particularly useful for historical research (esp for migration).

I 've tried to summarise the reviewers' comments into two categories :

In the way of substantive content revisions that this lesson needs, there are a couple of comments from the reviewers. I think at this point the best step would be for you to incorporate as many of the suggestions as possible and ask if something is unclear:

[x] lesson's length : this is a lesson that exceeds our 8.000 word limit (incl. code). While I need to double check with the PH team our strategy on this specific case, I (@amsichani) 'd be inclined to suggest to make it more compact as it might need to omit steps or aspects which are necessary for an non-specialist reader in order to fully understand and follow the tutorial. You can always move the list of suggested articles to a separate section after the end of the tutorial , to save up some words.
[ ] References suggestions
Ravenstein, E. G. (1885). The Laws of Migration. Journal of the Statistical Society of London, 48(2), 167-235.
Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic geography, 46(sup1), 234-240.
Tobler, W. (1995). Migration: Ravenstein, Thornthwaite, and beyond. Urban Geography, 16(4), 327-343. https://doi.org/10.2747/0272-3638.16.4.327
At the end in the suggested articles, it will be also important to recommend projects that are working with migration data, such as Torn Apart / Separados.
[ ] Ιt is true that this tutorial is very specific to certain migrations groups in Europe. That is, that the data obtained from London is, in fact, reliable, therefore it was confident to map a certain area of London. But it is important to consider that in some cases or countries the migration and migrants data is not well documented. An example of this is the records of the undocumented migrants in the United States or the people who go back and forth from one country to another. I recommend mentioning this as a endnote or in a few sentences since it is important for others to be aware of this issue. A reference to Torn Apart / Separados will also be useful at this point.
[x] The author is approaching the mathematical solutions putting into context two examples, people and coffee. But, from a more ethical perspective, I recommend that in this paragraph (text box after p.9 " large number of things of the same type (people, coffee beans, widgets)") it is important to differentiate a study with people and a study of goods, preventing the issue of converting people into objects. Clarifying this will be very helpful. [I have changed this to "entities". Please let me know if you have something else in mind -ac]
[ ] p. 14, the reader is advised to use a text editor. I’d add a note that they can also directly create a new script if they’re using R Studio. It’s the same thing, of course, but people may prefer the sense of keeping everything contained within one application.

In the example code:

[ ] my setup required me to use: install.packages(“MASS”) rather than – as shown - install.packages(MASS) (i.e. quoting the package name)
[x] I don’t think the command numbering #1, #2 etc is necessary
[ ] Depending on where they’ve placed files etc, some people will need to change the working directory in order for the read.csv() command to work. Again, this should be covered in basic R tutorials, but could be included here as well (eg in RStudio – Session > Set working directory > Choose directory)
[ ] p. 30 It is until this point where it mentions the issues of the data found in the primary courses. It is a good description but still need some clarification at the beginning for an audience that will follow these methods using other migration cases.
[ ] For those unfamiliar with models of this type, I think that a potential source of confusion is the shift from the generalised model at para 63 to the specific one at para 65. It is explained earlier that µ is a population, but no hint is given at this stage about the specific purposes of the subscripts: that µij is the population interaction between origin i and destination j, i.e. the count of migrants. This is covered later on (j is described as referring to London, although it can be understood more generally than that as any modelled destination [edit - I think that the model here however is a regression for one destination only, and not a more complex doubly-constrained model that work work for a set of origins and destinations] ). Although it risks obvious repetition, I suspect that the relevance of the subscripts should be highlighted.
[ ] The section on setting beta values also covers some quite technical areas, and is largely presented as a black box. I think a couple of aspects need comment. The need for complete data is stressed in Para 80, whilst in Para 119 it is mentioned that we need to tell the glm library what to do in the event of there being no data. Is na.action a mandatory argument? If the sample data set does not have any missing data, then it might be worth noting to readers that we do not need to worry about no data. More generally, I think a little more explanation might help that in this section the beta parameters are being determined across the whole data set from known data, and we can then compare individual origin-specific observations with this general model. We can then examine these and identify over/under predicted flows, as has been done. These might elicit an interpretation back on knowledge of the phenomena (although it’s also possible that mis-predictions can identify cases where the input data is flawed).
[ ] I agree that it is a good idea to suggest that people use a calculator to work out a result by hand. It’s possible however to keep the next stage in R, by doing something like:

gravityModelData$pred = exp(-3.848 + (1.235 log(gravityModelData$population)) + (-0.542 log(gravityModelData$distance)) + (-0.024 gravityModelData$wheat) + (-0.025 gravityModelData$wages) + (-0.014 * gravityModelData$wageTrajectory) )

or, to keep the full precision of the parameters (and avoid having to re-type them):

gravityModelData$pred2 = exp(gravityModel$coefficients[1] + (gravityModel$coefficients[2] log(gravityModelData$population)) + (gravityModel$coefficients[3] log(gravityModelData$distance)) + (gravityModel$coefficients[4] gravityModelData$wheat) + (gravityModel$coefficients[5] gravityModelData$wages) + (gravityModel$coefficients[6] * gravityModelData$wageTrajectory) )

In this example, pred and pred2 have slightly different results, due to the rounding of the parameters in the original. Note that R’s log() wil give the same answer that ln() will give on a calculator.

Some specific proofing notes:

[x] modelling - modeling : adopt a single writing throughout the lesson for consistency reasons
[x] p. 16 It is necessary to include the tutorial to install R.
[x] p.16 MASS : omit the hyperlink on the first instance of the word.
[ ] p.24 Vagrancy Act/ system: a bit of historical context will be welcomed here
[x] p.28 the dataset page identifies that there is also a newer version; does the link need updating?
[x] p.33 caption fig.4 sentence case instead of Title Case
[x] p.34 amend the list formatting
[x] p. 39 The equation needs correcting, as the parentheses do not balance; this is also true of the equation at para 46, which, I think, should be: y=β0+β1(x1)+β2(x2)+...+βp(xp) (or perhaps y=β0+(β1(x1)+β2(x2)+...+βp(xp)) ) i.e. some additional opening parentheses have crept in
[x] The 39 equation definitely needs a closing bracket after the last term (so that exp() applies to the whole of the right hand side); the separate terms could be placed in their own parentheses – as the last three terms are – if this makes it more readable, but the first two terms have omitted their closing brackets. NB – the equation is restated at para 92 without the per-term parentheses; I’d be happy to use that as the canonical layout, and to suggest that the other uses of the model adopt the same layout.
[x] p. 40 typo, ‘Congen’ should be ‘Congdon’
[x] p.41 I ‘d suggest to omit this list of active links here (and the ref) and instead place the links within each of the following sections.
[ ] p. 44 perhaps worth stating, simply because of familiarity, that linear regression can also be carried out in proprietary statistics software such as SPSS, and in tools such as Excel?
[x] p. 64 It will be good to include the link of the Economic History Review article.
[x] Table 3 : there are some well known formatting issues here. Perhaps the following reviewer's suggestion might solve the problem - worth trying : the Dij distance measures are quoted to far too many decimal places and have spurious accuracy. 8DP (as shown) is suggesting sub millimetre level accuracy! No more than 1DP needs to shown in the table. Similarly, the wage trajectory doesn’t need to be shown to such a large number of places.
[x] p.68-70 amend the list formatting
[ ] p. 72 I suggest providing other solution in case one of these variables does not work. Provide an alternative solution will be good.
[ ] p.77 After paragraph 76 it seems confusing because the example is towards the coffee study. I suggest providing examples of the migration case or announce that you will now provide examples of another case.
[x] p.86 Wa = Wα // WaΤ = WαΤ : adopt a single writing of the variable throughout the lesson for consistency reasons
[ ] p. 88 Not sure if this link of the csv file supposes to take you somewhere but if it does, it doesn’t work.
[x] p. 89 – repeat of the word ‘final’
[ ] p.92 λ element should be defined in the main text
[x] p. 98 The 1 is not visible because of the paragraph symbol. [this is only visible on the submissions site -ac]
[ ] p. 103 It will be necessary to provide information on which calculator model will be good to consider.
[ ] p. 105 in the definition of ‘standard deviation’ I’d avoid the word ‘variance’ which has its own specific meaning, and use ‘variation’ or ‘dispersal’.
[ ] p. 112 Until this point the tutorial to install R is provided. It is good but also it will be good to mention it since the beginning.
[ ] p. 113 not sure if the link of the .csv file takes you somewhere, if it is, it is not working.
[ ] p.124 giving descriptive terms for the components makes the equation overflow the page
[ ] p. 138 It will be important to make the conclusion broader and not only focus in London, in order to contemplate this tutorial for other countries case studies (see the 2nd comment of the category 1)
[ ] there is an issue with the display of the wide formulas on p.125, 130, 131 . I need to ask our tech team if this could be solved with small amendments on the general CSS table margins/borders code .

amsichani commented 5 years ago

hey @acrymble , I was wondering whether you have a date for submitting the new version of the tutorial after taking into account the reviewers' comments.

acrymble commented 5 years ago

@amsichani Sorry I didn't know you were waiting for me. I was waiting on your guidance about the length, since that decision will dramatically impact the changes, and I'd rather do anything all in one go.

amsichani commented 5 years ago

@acrymble given the complexity and the difficulty of the lesson, I 'd suggest we will accept this tutorial as it stands. Please go ahead with the above mentioned suggestions, and if you are able to cut a paragraph or a dozen of words at some point to make it more compact, don't hesitate ;-) re the layout issue, we ll have a more close look when the lesson will be in a PR status.

acrymble commented 5 years ago

Ok. I likely won't be able to do this until the end of teaching term. But I will put it on my priority list immediately thereafter.

acrymble commented 5 years ago

I've managed to make a number of the changes above. Just to keep my own list of tasks clear, these ones are still to tackle:

References suggestions

[x] Ravenstein, E. G. (1885). The Laws of Migration. Journal of the Statistical Society of London, 48(2), 167-235.
[x] Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic geography, 46(sup1), 234-240.
[ ] Tobler, W. (1995). Migration: Ravenstein, Thornthwaite, and beyond. Urban Geography, 16(4), 327-343. https://doi.org/10.2747/0272-3638.16.4.327 [I don't have access to this article -ac]
[ ] At the end in the suggested articles, it will be also important to recommend projects that are working with migration data, such as Torn Apart / Separados. [while this is an interesting project, I don't see the relevance to this tutorial as it does not seem to use this or a similar method, and is instead a project of the reviewer. Unless you can help me understand why this should be added, I'm going to leave it off -ac]
[x] Ιt is true that this tutorial is very specific to certain migrations groups in Europe. That is, that the data obtained from London is, in fact, reliable, therefore it was confident to map a certain area of London. But it is important to consider that in some cases or countries the migration and migrants data is not well documented. An example of this is the records of the undocumented migrants in the United States or the people who go back and forth from one country to another. I recommend mentioning this as a endnote or in a few sentences since it is important for others to be aware of this issue. A reference to Torn Apart / Separados will also be useful at this point.
[x] p. 14, the reader is advised to use a text editor. I’d add a note that they can also directly create a new script if they’re using R Studio. It’s the same thing, of course, but people may prefer the sense of keeping everything contained within one application.

In the example code:

[x] my setup required me to use: install.packages(“MASS”) rather than – as shown - install.packages(MASS) (i.e. quoting the package name)
[x] Depending on where they’ve placed files etc, some people will need to change the working directory in order for the read.csv() command to work. Again, this should be covered in basic R tutorials, but could be included here as well (eg in RStudio – Session > Set working directory > Choose directory)

--

[x] p. 30 It is until this point where it mentions the issues of the data found in the primary courses. It is a good description but still need some clarification at the beginning for an audience that will follow these methods using other migration cases.
[x] For those unfamiliar with models of this type, I think that a potential source of confusion is the shift from the generalised model at para 63 to the specific one at para 65. It is explained earlier that µ is a population, but no hint is given at this stage about the specific purposes of the subscripts: that µij is the population interaction between origin i and destination j, i.e. the count of migrants. This is covered later on (j is described as referring to London, although it can be understood more generally than that as any modelled destination [edit - I think that the model here however is a regression for one destination only, and not a more complex doubly-constrained model that work work for a set of origins and destinations] ). Although it risks obvious repetition, I suspect that the relevance of the subscripts should be highlighted.

-- The section on setting beta values also covers some quite technical areas, and is largely presented as a black box. I think a couple of aspects need comment.

[x] The need for complete data is stressed in Para 80, whilst in Para 119 it is mentioned that we need to tell the glm library what to do in the event of there being no data. Is na.action a mandatory argument? If the sample data set does not have any missing data, then it might be worth noting to readers that we do not need to worry about no data. [this is not a strictly required argument, as far as the example problem is concerned. I've removed this to reduce confusion and to forego needing to add additional information to an already long tutorial -ac]
[x] More generally, I think a little more explanation might help that in this section the beta parameters are being determined across the whole data set from known data, and we can then compare individual origin-specific observations with this general model. We can then examine these and identify over/under predicted flows, as has been done. These might elicit an interpretation back on knowledge of the phenomena (although it’s also possible that mis-predictions can identify cases where the input data is flawed).
[x] I agree that it is a good idea to suggest that people use a calculator to work out a result by hand. It’s possible however to keep the next stage in R, by doing something like:

gravityModelData$pred = exp(-3.848 + (1.235 log(gravityModelData$population)) + (-0.542 log(gravityModelData$distance)) + (-0.024 gravityModelData$wheat) + (-0.025 gravityModelData$wages) + (-0.014 * gravityModelData$wageTrajectory) )

or, to keep the full precision of the parameters (and avoid having to re-type them):

gravityModelData$pred2 = exp(gravityModel$coefficients[1] + (gravityModel$coefficients[2] log(gravityModelData$population)) + (gravityModel$coefficients[3] log(gravityModelData$distance)) + (gravityModel$coefficients[4] gravityModelData$wheat) + (gravityModel$coefficients[5] gravityModelData$wages) + (gravityModel$coefficients[6] * gravityModelData$wageTrajectory) )

In this example, pred and pred2 have slightly different results, due to the rounding of the parameters in the original. Note that R’s log() wil give the same answer that ln() will give on a calculator.

--

Some specific proofing notes:

[x] p.24 Vagrancy Act/ system: a bit of historical context will be welcomed here [I've added a few sentences here -ac]
[x] p. 44 perhaps worth stating, simply because of familiarity, that linear regression can also be carried out in proprietary statistics software such as SPSS, and in tools such as Excel?
[ ] p. 72 I suggest providing other solution in case one of these variables does not work. Provide an alternative solution will be good. [I'm afraid I don't understand this query -ac]
[x] p.77 After paragraph 76 it seems confusing because the example is towards the coffee study. I suggest providing examples of the migration case or announce that you will now provide examples of another case.
[x] p. 88 Not sure if this link of the csv file supposes to take you somewhere but if it does, it doesn’t work.
[x] p.92 λ element should be defined in the main text [changed to mu for consistency. was a lambda in the original article. good catch. -ac]
[x] p. 103 It will be necessary to provide information on which calculator model will be good to consider.
[x] p. 105 in the definition of ‘standard deviation’ I’d avoid the word ‘variance’ which has its own specific meaning, and use ‘variation’ or ‘dispersal’.
[x] p. 112 Until this point the tutorial to install R is provided. It is good but also it will be good to mention it since the beginning.
[x] p. 113 not sure if the link of the .csv file takes you somewhere, if it is, it is not working. [this will work once the files are moved; this is an artefact of the submission system -ac]
[x] p.124 giving descriptive terms for the components makes the equation overflow the page
[x] p. 138 It will be important to make the conclusion broader and not only focus in London, in order to contemplate this tutorial for other countries case studies (see the 2nd comment of the category 1)
[ ] there is an issue with the display of the wide formulas on p.125, 130, 131 . I need to ask our tech team if this could be solved with small amendments on the general CSS table margins/borders code . [I've reduced the width on Table 3 as best I can but it still is JUST too wide. This will need a CSS fix I think -ac]
[x] update the downloadable CSV file to take into account decimal places comment
[x] update the downloadable .r file to take into account changes to the r code.
[x] Update figure showing model beta values in case changes in decimal places creates new numbers. Revise accordingly throughout.

acrymble commented 5 years ago

@amsichani I believe I have now completed the revisions. I was able to make all of the changes for which there is a tick mark above.

There were a few where I wasn't able to, or don't feel able to make the changes. They are (with my reasons):

Tobler, W. (1995). Migration: Ravenstein, Thornthwaite, and beyond. Urban Geography, 16(4), 327-343. https://doi.org/10.2747/0272-3638.16.4.327 [I don't have access to this article so I can't check if it's a relevant reference -ac]
At the end in the suggested articles, it will be also important to recommend projects that are working with migration data, such as Torn Apart / Separados. [while this is an interesting project, I don't see the relevance to this tutorial as it does not seem to use this or a similar method, and is instead a project of the reviewer. Unless you can help me understand why this should be added, I'm going to leave it off -ac]
p. 72 I suggest providing other solution in case one of these variables does not work. Provide an alternative solution will be good. [I'm afraid I don't understand this query -ac]
there is an issue with the display of the wide formulas on p.125, 130, 131 . I need to ask our tech team if this could be solved with small amendments on the general CSS table margins/borders code . [I've fixed most of it but even though I've reduced the width on Table 3 as best I can, it still is JUST too wide. This will need a CSS fix I think -ac]

amsichani commented 5 years ago

thanks for the update @acrymble . All your corrections & points sound valid me (I ll send you the article to have a look). I am going to go through the editorial checklist for acceptance and publication of the tutorial , while trying to fix the persistent small layout issue. I ll let you know as soon as we are ready to go live ;-)

amsichani commented 5 years ago

Getting there @acrymble ! I 've checked one more time the lesson, having also the editorial checklist in hand. @mariajoafana thanks for acting as managing editor for this -- the files you need to move should be all in this repo:

ph-submissions/lessons/gravity-model.md images/gravity-model/ - (all the stuff in here) assets/gravity-model/ - (all the stuff in here) gallery/gravity-model.png gallery/originals/gravity-model-original.png

As for the author's bio...I think we are OK.

There is still a persistent minor layout issue here - I d suggest we are waiting a bit before publishing so @mdlincoln could have a look on it.

Let me know how it goes and do let me know when you're ready to merge the pull request so that I can tweet and add it to the twitter bot.

mariajoafana commented 5 years ago

Thanks @amsichani I'll wait for @mdlincoln to take a look before I publish the lesson

amsichani commented 5 years ago

hey @mariajoafana , @mdlincoln suggested he ll have a look on this once you move this lesson onto a PR on this main repo.

mariajoafana commented 5 years ago

Ok

mariajoafana commented 5 years ago

The tutorial is published now! https://programminghistorian.org/en/lessons/gravity-model

mariajoafana commented 5 years ago

I'll promote it in Twitter tomorrow

amsichani commented 5 years ago

Many thanks to all for contributing to this lesson and ticket 🎊 🎉! I m now going to close it.

programminghistorian / ph-submissions

Review Ticket: Gravity Models of Migration #204

Anti-Harassment Policy