Closed amsichani closed 2 years ago
Given that's mid-August and not much is expecting to happen in the next couple of weeks project-wise, I just want to update @apjanco and @grunewas that we are excited to have @gadolou as a reviewer for this lesson -- looking forward to her review ! I will update soon on the name of the second reviewer and I will also add my initial comments here . Cheers!
Thank you @amsichani for providing the opportunity to contribute with my review to the preparation of this tutorial. And thank you @apjanco and @grunewas for this tutorial! It is quite interesting and people dealing with historical texts and place names will find it helpful. The tutorial is constructed in two parts clearly defined, NER and the World Historical Gazetteer datasets building, that can also act as stand-alones. For the first part, readers are expected to be more familiar with programming languages but provided references to specific tutorials are helpful. Code excerpts and images are also available and address almost every step of the procedure described which is good. Test data are provided (however the link in paragraph §7 doesn’t load). As a general comment and to address users not familiar with programming, I would add more detailed explanation in some paragraphs that present the code. Also, I would explain more about how reconciliation of WHG works and what prerequisites are needed to work succesfully. In more detail, I have the following suggestions: §1: “First, we produce…” Someone would expect what the second thing would be. Also, a more clearly defined goal of the lesson is needed here.
§3: “These spatial steps…” I would rephrase this sentence, replacing historical maps with maps depicting historical information or entities etc. The term historical maps means the old maps. Also, in this paragraph, I would remove spatial analysis since it is not implemented in any part of the procedure described in the tutorial.
§7: the link in paragraph doesn’t load
§11: perhaps a note for the readers is needed here that the WHG is more successful in matching old and modern names if the official datasets store this kind of information (I mean there are old place names that are not supported either because they are not included in the basic datasets of WHG as matches to modern ones or their locations are unknown). Also, a note whether WHG can perform location (coordinates) matching?
§12: either “currently” or “point of time” can be removed.
§20: Before loading the language in §21, an “import spacy” is needed, it would be helpful to add it after “pip install spacy”.
§25: more detailed explanation about how coding is applied.
§37: “it will thie…” rephrase
§45: a reminder to the readers how/where “pip install…” should be applied.
§46: “we now entites…”: we now have entities? Also, “DBpedia”.
§50: “By using the WHG…”: not always (see comment in §11).
§69: “Go ahead…” : Go ahead and finish?
§75: Figure 12 is not loaded (at least for me 😊)
§76: ”…but it was”: “…but it will also…” And “in th bottom” – in the bottom.
Thank you for proposing the tutorial!
fantastic @gadolou , thanks for the useful review. Happy to inform you that the second reviewer for this lesson will be @rkhatib - looking forward to her review too! @apjanco and @grunewas I d suggest to wait for all the reviewers' comments before start updating / reviewing your lesson ;-) cheers!
A huge thanks @amsichani for inviting me to review this wonderful tutorial; it’s been lovely to see the PH review process in action! Thank you @apjanco and @grunewas for putting together this particularly useful tutorial that will surely help many researchers carry out spatial analysis by extracting, matching, enriching, and geovisualizing place names in texts.
The way the lesson is divided into two parts is useful and offers multiple points of entry depending on the data users have. Another strong positive for me is the clear articulation of a very interesting research topic that is tailored for spatial analysis and exploration. Overall, the way the lesson is structured is clear to follow, with many helpful explanations of the functions, and accompanying images and links throughout.
At the outset, a clearer explanation of the context and goals of this lesson, as well as a more detailed account of what is covered in the tutorial and how it can be useful in research would be beneficial (e.g. DBpedia and reconciliation). In the spirit of multilingual DH, it could be interesting to point to places where other sample spatial datasets in different languages may be found, such as Zenodo.
Here are some more detailed suggestions:
§1 and §3 could be combined to help define the goals of the workshop, with a broad explanation of what the steps for spatial analysis are and how they correspond to the two parts of the lesson.
§1: First, we produce… It would be useful at the outset here to provide more information about what TSV and HTML files are in this context and what types of interpretations they enable, especially for a beginner audience who is trying to understand what methods would work best for their research goals.
§2 Finally… Can you please elaborate on what the “Linked Places Format” and “reconciliation” and “geocoding” are, in layman’s terms?
§5 and §6 Very interesting research topic!
§7 Sample text corpus unavailable
§10 …represent the “contemporary” Soviet names Consider removing “contemporary” since contemporary place names are the current Russified versions
§11 In the opening sentence: The gazetteer from this lesson is also an example.. Consider rephrasing the last sentence or separating it into two sentences for clarity
§14-§24 Appreciate the clear explanation of text processing and NLP in Python
§18 ..spaCy works extremely well for tasks Consider providing acronym for part of speech tagging (POS) if giving one for NER, NLP, etc for consistency
§26 It might be helpful to provide some examples for how the “matcher” function can be used in research
§36 You will encounter machine errors, so it’s important to review the results and to correct errors. How does one go about this?
§37 Not sure what is meant by “assess predictions”
§46 DBpedia section is so interesting! Note that I have replaced… please watch acronym capitalization in this paragraph.
§48 This is a more involved process than we can detail here. Fascinating to learn about the level of disambiguation that spaCy can support in predicting entities
Uploading to the World Historical Gazetteer I really enjoyed this section, very well written and useful for research
§50 Might be helpful to define “geocoding” here if you don’t do so in the introduction (see §2)
§62 ..your place upload Please reword for clarity.
§67 of the ”Reconciliation Review” screen.
§70 ..taken to complete the review of..
§76 In the bottom..
Thank you for putting together this useful tutorial!
Many thanks @rkhatib for the really detailed review! @apjanco and @grunewas given that we now have two very detailed reviews in place, I 'd suggest to go ahead and address the issues raised by @rkhatib and @gadolou instead of waiting for me to synthesise a commentary-- how 30 November sounds as a deadline for an updated version of the lesson? In the meantime, I will make sure there is no pending tech gremlins in our end, based also in the reviewers points, eg with embedded images. My personal goal is to try and publish this lesson well before Christmas holidays! Thanks !
Hi @apjanco and @grunewas , thanks for addressing the reviewers' comments- I think the lesson has been improved and is really such a great resource now -- many thanks for this! I will now going to have another final closer look at this and try to fix some persistent errors (eg Figure 12 is still not loaded at my end) and once everything is ready, I will then start the publication process. Nearly there! Many thanks again for the collaboration!
Hi @anisa-hawes! We have a lesson ready for copyediting whenever you're available. (I know we're on PH break at the moment!)
Thank you, @svmelton! I will add this to my to-do list for the new year!
Hello @amsichani, Hello @apjanco and @grunewas,
My name is Anisa and I’m Programming Historian’s Publishing Assistant.
I’ve copyedited your lesson, and my comments/suggested revisions are now ready for your review.
I have applied my suggested revisions directly to the markdown file in our Submissions Repository. You can view the individual additions and subtractions I’ve made in the Commit History.
I’m going to paste a list of my comments and suggested revisions below, to ensure that the copyediting process is transparent and can invite discussion.
I've formatted my comments/suggested revisions as a list of tasks. You will notice that many of these tasks are ‘checked’, because I’ve already made the changes. A small number remain ‘unchecked’ because they ask questions.
I hope you'll find these comments and suggestions useful. Please let me know if you'd like to talk through anything through. You're not obligated to take my suggestions on board. Read through and we can have a converstaion!
--
[x] Suggested revisions: To begin, we will produce a tab-separated value (TSV) file with a row for each occurrence of the term and an HTML file of the text with the terms highlighted. [add 'will'. add link to define 'tab-separated value']
[ ] Q: each occurrence of each term?
[ ] Q: all terms highlighted?
[x] Suggested revision: A visualization can be used to interpret the results and to assess their usefulness for a given project. ['A' instead of 'this', because the visualisation form has not been established yet]
[x] The goal of the lesson is to systematically search a text corpus for place names and then to use a service to locate and map historic place names. [delete extra space between 'to' and 'systematically']
[x] Suggested revision: This lesson will be useful for anyone wishing to perform named entity recognition (NER) on a text corpus. [add link]
[x] Suggested revision: Other users may wish to skip the text extraction portion of this lesson and focus solely on the spatial elements of the lesson, that is gazetteer building and using the World Historical Gazetteer (WHG). [add link and abbreviation 'WHG' at first mention]
[x] Suggested revisions: We urge you to try both parts of the lesson together if you have time, as this will enable you to learn how text analysis and mapping can be combined in one project. Additionally, it will demonstrate how the results of these two activities can be ported into another form of digital analysis. [as this will enable you to learn / how text analysis and mapping can be combined / Additionally, it will demonstrate how the results of these two activities]
[x] Suggested revision: In this lesson, readers will first use the Python pathlib library to load a directory of text files. [add 'first'. add link to define 'pathlib']
[x] Suggested revision: For those with an existing gazetteer or list of terms, readers will then be able to create a list of term matches and their locations in a text. [add link to define 'gazetteer'. will then be able to create]
[x] Suggested revision: Those without a gazetteer, can use a statistical language model to identify places. [avoid repetition in 'users can use']
[x] Suggested revision: Finally, users will create a TSV file in the Linked Places Format, which can then be uploaded to the World Historical Gazetteer. [add link to define 'Linked Places Format'. remove link to World Historical Gazetteer here, use upon first mention above. Note that this is a https link, not http. Remove hyphen between world and historical]
[ ] Comment: 'Finally' reads strangely because it doesn’t follow a sequence. I have added in first, and then be able to to make this flow.
[x] Suggested revision: Linked Places Format is designed to standardize descriptions of places and is primarily intended to link gazetteer datasets.
[x] Add link: It is a form of Linked Open Data (LOD), which attempts to make datasets interoperable.
[x] Suggested revision: geocode, or ‘match’,
[x] Suggested revision: plots on a map.
[x] Suggested revision: multilingual digital humanities.
[x] Suggested revision: , while the list of place names represent [...] 1941-56.
Natural language processing
Load the gazetteer
Matching Place Names
Loading Text Files
Term Frequency
Named Entity Recognition
DisplaCy
Named Entity Linking
Export Our Data
Thanks @anisa-hawes—I'd advise removing this sentence: "Rather, Grunewald argues that the term “Siberia” served as a decorative term that framed POWs as victims who had endured harsh conditions and cruelty in an exoticized Soviet East." It's an interesting argument but we don't really have time to explore it in this piece. I think the rest of the section stands without it.
Thank you, @svmelton. This is useful. Please let us know how you feel about that, @apjanco and @grunewas.
I have also noted two typing errors I introduced in 2. Historical example. Sincere apologies, now corrected.
Just to reiterate that I'm very happy to talk through any of the suggested changes if you'd like to.
Very best wishes, Anisa
Thanks @anisa-hawes for your great copy editing and @svmelton for your input! @grunewas and @apjanco, I checked the PR , it looks fine - here are just a couple of points I think you need to amend in this PR:
[ ] para 5 : I see the suggestion you made is trying to address the implications both @anisa-hawes and @svmelton are commenting on. I am in favour of accepting it.
[ ] para 10, trying to map the places mentioned in the memoirs is therefore is not a simple task as the names have changed both in the Soviet and post-Soviet era. --> delete the extra 'is'
[ ] Figure 12 : hopefully it will load now after this small edit. Lets see -- everything crossed !
[ ] great catch on the figure numbering - as I noticed, you didn't change the filenaming in the /images . As far as I can see , there is no other implication, so it's fine. As a general rule, we would like the assets' file naming to mirror the figure/assets numbering in the lesson itself for sustainability purposes.
[ ] Section numbering should be fine now.
Happy to clarify if there is anything unclear. Many thanks all for the fantastic work!
Thank you for your reply, @amsichani. The final step for me will be to replace all the live web links with Perma.cc archived links.
Within my copy edits I made some suggestions for additional links, so I await confirmation that you are all happy with these.
Everything seems good to @grunewas and I. I submitted a PR to fix the extra "is".
Great @apjanco! I am approving the PRs now but there is a conflict on para 5 between the two files , not allowing for the PR to be deployed. Can you fix this or do you want me to do this? cheers!
It's saying "no conflicts" to me. If you can fix it, that would be great. Thank you!
I've closed PR #451 to avoid conflicts with the base branch. Meanwhile, I have removed additional word "is" from para.52 in your first PR #448.
I have also adjusted the syntax of the link you've inserted at para.41 so that it now reads:
Rather, Grunewald argues that the term "Siberia" served as a term to emphasize suffering in Soviet captivity.
I'll wait for you to merge the PR, @amsichani.
Hello @apjanco and @grunewas, If there are any further changes needed, you can make direct edits to lesson files. There's no need to use the Git Pull Request system in the Submissions Repository.
Let me know if I can help with anything else. When you’re happy with the final version of the lesson, I’ll go ahead and replace all external links with perma.cc archival links.
Very best wishes, Anisa
Looks good to us! Best, Andy
On Fri, Jan 28, 2022 at 2:42 PM Anisa Hawes @.***> wrote:
I'll wait for you to merge the PR, @amsichani https://github.com/amsichani.
Hello @apjanco https://github.com/apjanco and @grunewas https://github.com/grunewas, If there are any further changes needed, you can make direct edits to lesson files. There's no need to use the Git Pull Request system in the Submissions Repository.
Let me know if I can help with anything else. When you’re happy with the final version of the lesson, I’ll go ahead and replace all external links in the lesson with perma.cc archival links.
Very best wishes, Anisa
— Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/383#issuecomment-1024578722, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADWHMFMBMJTAJDVXQU74RILUYLWTXANCNFSM5ASQTUMA . You are receiving this because you were mentioned.Message ID: @.***>
Hello @amsichani, @apjanco and @grunewas,
In my latest commit I have done the following tasks:
*I have taken the decision not to replace two links in the lesson with their perma.cc links
Many thanks for these final edits, @anisa-hawes !
I'm re-posting @svmelton the list of all the relevant files for reference:
images/finding-places-world-historical-gazetteer - images files assets/finding-places-world-historical-gazetteer - asset files lessons/finding-places-world-historical-gazetteer.md - the lesson file gallery/finding-places-world-historical-gazetteer.png - the modified avatar gallery/originals/finding-places-world-historical-gazetteer-original.png - the original avatar
Authors bio:
- name: Susan Grunewald
team: false
orcid: 0000-0003-1275-4101
bio:
en: |
Susan Grunewald is the Digital History Postdoctoral Associate at the University of Pittsburgh World History
- name: Andrew Janco
team: false
orcid: 0000-0002-8872-9474
bio:
en: |
Andrew Janco is the Digital Scholarship Librarian at Haverford College.
The lesson is now ready . many thanks to all for the collaboration!
Thank you @amsichani! I'll get this up in the next couple of days.
Hello all,
Please note that this lesson's .md file has been moved to a new location within our Submissions Repository. It is now found here: https://github.com/programminghistorian/ph-submissions/tree/gh-pages/en/drafts/originals
A consequence is that this lesson's preview link has changed. It is now: http://programminghistorian.github.io/ph-submissions/en/drafts/originals/finding-places-world-historical-gazetteer
Please let me know if you encounter any difficulties or have any questions.
Very best, Anisa
Hello @apjanco and @grunewas,
I've just been doing a final check-through of your lesson.
There is one point I'm confused about, which I am afraid I somehow missed...
The information box at para. 399 includes the sentence "If you are building your own dataset, it is worth taking the time to add a country codes (ccodes) column into the file you upload as well as aat type with the corresponding type (e.g. settlement, state, country)."
The meaning of the second half of this sentence isn't clear to me, and I am not sure how it should read. Are you able to advise?
Did you intend "[...] as well as a type column with the corresponding type (e.g. settlement, state, country)"?
I can implement the change on your behalf – the .md file has already been moved over to our other repo.
Thank you!
All best, Anisa
@anisa-hawes Good catch. I did indeed intend to write "[...] as well as a type column with the corresponding type (e.g. settlement, state, country)." Could you please change that? Thanks!
Thank you for the clarification @grunewas! I've made this change. Much appreciated.
The lesson is now live! Thanks to @apjanco and @grunewas for your piece, @amsichani for serving as the editor, and @gadolou and @rkhatib for peer reviewing the piece.
@amsichani, this should be ready to promote on the Twitter bot now.
Thanks to everyone for your work!
@anisa-hawes I'm sorry if this isn't the correct way to address this issue with an older, already published lesson. In the process of discussing a different lesson in the review process, I discovered that Wikipedia links were added to this lesson. I spoke with @apjanco and he was also unaware that these links to Wikipedia had been added to this lesson at some point in the publication process. I would like to request to have the Wikipedia links replaced with other ones that are more in line with publication expectations for my current institution and discipline of history. Andy supports these changes and this reasoning.
Could the following terms have the Wikipedia links swapped with the suggested ones:
[x] TSV - please switch to https://www.loc.gov/preservation/digital/formats/fdd/fdd000533.shtml
[x] Named Entity Recognition - https://www.ibm.com/topics/named-entity-recognition or https://docs.deeppavlov.ai/en/master/features/models/NER.html
[x] Pathlib - https://realpython.com/python-pathlib/
[x] Gazetteer - https://support.esri.com/en-us/gis-dictionary/gazetteer
[x] Natural Language Processign - https://www.nnlm.gov/guides/data-glossary/natural-language-processing
[x] Tokenization - https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html
Thank you, @grunewas. I will prepare these edits and let you both know when they are complete.
Hello @grunewas and @apjanco,
This links have been replaced in https://github.com/programminghistorian/jekyll/pull/3214, and the changes are now reflected on our live site.
All best wishes, Anisa
Thanks @anisa!
On Mon, Mar 18, 2024, 5:22 PM Anisa Hawes @.***> wrote:
Hello @grunewas https://github.com/grunewas and @apjanco https://github.com/apjanco,
This links have been replaced in programminghistorian/jekyll#3214 https://github.com/programminghistorian/jekyll/pull/3214, and the changes are now reflected on our live site.
All best wishes, Anisa
— Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/383#issuecomment-2005149596, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJTTIJ7HTNWJI2ZTJLWYH3YY5SI5AVCNFSM5ASQTUMKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBQGUYTIOJVHE3A . You are receiving this because you were mentioned.Message ID: @.***>
The Programming Historian has received the following tutorial on 'Finding places in text with the World Historical Gazeteer' by @apjanco and @grunewas. This lesson is now under review and can be read at:
http://programminghistorian.github.io/ph-submissions/en/drafts/originals/finding-places-world-historical-gazetteer
Please feel free to use the line numbers provided on the preview if that helps with anchoring your comments, although you can structure your review as you see fit.
I will act as editor for the review process. My role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum. I will read through the lesson and provide initial feedback yp authors.
Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.
I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me.
Our dedicated Ombudsperson is (Ian Milligan - http://programminghistorian.org/en/project-team). Please feel free to contact him at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudsperson will have no impact on the outcome of any peer review.
Anti-Harassment Policy
This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.
[Permission to Publish]
The editor must also ensure that the author or translator post the following statement to the Submission ticket.