programminghistorian / ph-submissions

The repository and website hosting the peer review process for new Programming Historian lessons
http://programminghistorian.github.io/ph-submissions
136 stars 112 forks source link

Finding places in text with the World Historical Gazeteer #383

Closed amsichani closed 2 years ago

amsichani commented 3 years ago

The Programming Historian has received the following tutorial on 'Finding places in text with the World Historical Gazeteer' by @apjanco and @grunewas. This lesson is now under review and can be read at:

http://programminghistorian.github.io/ph-submissions/en/drafts/originals/finding-places-world-historical-gazetteer

Please feel free to use the line numbers provided on the preview if that helps with anchoring your comments, although you can structure your review as you see fit.

I will act as editor for the review process. My role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum. I will read through the lesson and provide initial feedback yp authors.

Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.

I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me.

Our dedicated Ombudsperson is (Ian Milligan - http://programminghistorian.org/en/project-team). Please feel free to contact him at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudsperson will have no impact on the outcome of any peer review.

Anti-Harassment Policy

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

The Programming Historian is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutinize ideas, to ask questions, make suggestions, or to requests for clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. Thank you for helping us to create a safe space.


[Permission to Publish]

The editor must also ensure that the author or translator post the following statement to the Submission ticket.

I the author|translator hereby grant a non-exclusive license to ProgHist Ltd to allow The Programming Historian English|en français|en español to publish the tutorial in this ticket (including abstract, tables, figures, data, and supplemental material) under a CC-BY license.

amsichani commented 2 years ago

Given that's mid-August and not much is expecting to happen in the next couple of weeks project-wise, I just want to update @apjanco and @grunewas that we are excited to have @gadolou as a reviewer for this lesson -- looking forward to her review ! I will update soon on the name of the second reviewer and I will also add my initial comments here . Cheers!

gadolou commented 2 years ago

Thank you @amsichani for providing the opportunity to contribute with my review to the preparation of this tutorial. And thank you @apjanco and @grunewas for this tutorial! It is quite interesting and people dealing with historical texts and place names will find it helpful. The tutorial is constructed in two parts clearly defined, NER and the World Historical Gazetteer datasets building, that can also act as stand-alones. For the first part, readers are expected to be more familiar with programming languages but provided references to specific tutorials are helpful. Code excerpts and images are also available and address almost every step of the procedure described which is good. Test data are provided (however the link in paragraph §7 doesn’t load). As a general comment and to address users not familiar with programming, I would add more detailed explanation in some paragraphs that present the code. Also, I would explain more about how reconciliation of WHG works and what prerequisites are needed to work succesfully. In more detail, I have the following suggestions: §1: “First, we produce…” Someone would expect what the second thing would be. Also, a more clearly defined goal of the lesson is needed here.

§3: “These spatial steps…” I would rephrase this sentence, replacing historical maps with maps depicting historical information or entities etc. The term historical maps means the old maps. Also, in this paragraph, I would remove spatial analysis since it is not implemented in any part of the procedure described in the tutorial.

§7: the link in paragraph doesn’t load

§11: perhaps a note for the readers is needed here that the WHG is more successful in matching old and modern names if the official datasets store this kind of information (I mean there are old place names that are not supported either because they are not included in the basic datasets of WHG as matches to modern ones or their locations are unknown). Also, a note whether WHG can perform location (coordinates) matching?

§12: either “currently” or “point of time” can be removed.

§20: Before loading the language in §21, an “import spacy” is needed, it would be helpful to add it after “pip install spacy”.

§25: more detailed explanation about how coding is applied.

§37: “it will thie…” rephrase

§45: a reminder to the readers how/where “pip install…” should be applied.

§46: “we now entites…”: we now have entities? Also, “DBpedia”.

§50: “By using the WHG…”: not always (see comment in §11).

§69: “Go ahead…” : Go ahead and finish?

§75: Figure 12 is not loaded (at least for me 😊)

§76: ”…but it was”: “…but it will also…” And “in th bottom” – in the bottom.

Thank you for proposing the tutorial!

amsichani commented 2 years ago

fantastic @gadolou , thanks for the useful review. Happy to inform you that the second reviewer for this lesson will be @rkhatib - looking forward to her review too! @apjanco and @grunewas I d suggest to wait for all the reviewers' comments before start updating / reviewing your lesson ;-) cheers!

rkhatib commented 2 years ago

A huge thanks @amsichani for inviting me to review this wonderful tutorial; it’s been lovely to see the PH review process in action! Thank you @apjanco and @grunewas for putting together this particularly useful tutorial that will surely help many researchers carry out spatial analysis by extracting, matching, enriching, and geovisualizing place names in texts.

The way the lesson is divided into two parts is useful and offers multiple points of entry depending on the data users have. Another strong positive for me is the clear articulation of a very interesting research topic that is tailored for spatial analysis and exploration. Overall, the way the lesson is structured is clear to follow, with many helpful explanations of the functions, and accompanying images and links throughout.

At the outset, a clearer explanation of the context and goals of this lesson, as well as a more detailed account of what is covered in the tutorial and how it can be useful in research would be beneficial (e.g. DBpedia and reconciliation). In the spirit of multilingual DH, it could be interesting to point to places where other sample spatial datasets in different languages may be found, such as Zenodo.

Here are some more detailed suggestions:

§1 and §3 could be combined to help define the goals of the workshop, with a broad explanation of what the steps for spatial analysis are and how they correspond to the two parts of the lesson.

§1: First, we produce… It would be useful at the outset here to provide more information about what TSV and HTML files are in this context and what types of interpretations they enable, especially for a beginner audience who is trying to understand what methods would work best for their research goals.

§2 Finally… Can you please elaborate on what the “Linked Places Format” and “reconciliation” and “geocoding” are, in layman’s terms?

§5 and §6 Very interesting research topic!

§7 Sample text corpus unavailable

§10 …represent the “contemporary” Soviet names Consider removing “contemporary” since contemporary place names are the current Russified versions

§11 In the opening sentence: The gazetteer from this lesson is also an example.. Consider rephrasing the last sentence or separating it into two sentences for clarity

§14-§24 Appreciate the clear explanation of text processing and NLP in Python

§18 ..spaCy works extremely well for tasks Consider providing acronym for part of speech tagging (POS) if giving one for NER, NLP, etc for consistency

§26 It might be helpful to provide some examples for how the “matcher” function can be used in research

§36 You will encounter machine errors, so it’s important to review the results and to correct errors. How does one go about this?

§37 Not sure what is meant by “assess predictions”

§46 DBpedia section is so interesting! Note that I have replaced… please watch acronym capitalization in this paragraph.

§48 This is a more involved process than we can detail here. Fascinating to learn about the level of disambiguation that spaCy can support in predicting entities

Uploading to the World Historical Gazetteer I really enjoyed this section, very well written and useful for research

§50 Might be helpful to define “geocoding” here if you don’t do so in the introduction (see §2)

§62 ..your place upload Please reword for clarity.

§67 of the Reconciliation Review” screen.

§70 ..taken to complete the review of..

§76 In the bottom..

Thank you for putting together this useful tutorial!

amsichani commented 2 years ago

Many thanks @rkhatib for the really detailed review! @apjanco and @grunewas given that we now have two very detailed reviews in place, I 'd suggest to go ahead and address the issues raised by @rkhatib and @gadolou instead of waiting for me to synthesise a commentary-- how 30 November sounds as a deadline for an updated version of the lesson? In the meantime, I will make sure there is no pending tech gremlins in our end, based also in the reviewers points, eg with embedded images. My personal goal is to try and publish this lesson well before Christmas holidays! Thanks !

amsichani commented 2 years ago

Hi @apjanco and @grunewas , thanks for addressing the reviewers' comments- I think the lesson has been improved and is really such a great resource now -- many thanks for this! I will now going to have another final closer look at this and try to fix some persistent errors (eg Figure 12 is still not loaded at my end) and once everything is ready, I will then start the publication process. Nearly there! Many thanks again for the collaboration!

svmelton commented 2 years ago

Hi @anisa-hawes! We have a lesson ready for copyediting whenever you're available. (I know we're on PH break at the moment!)

anisa-hawes commented 2 years ago

Thank you, @svmelton! I will add this to my to-do list for the new year!

anisa-hawes commented 2 years ago

Hello @amsichani, Hello @apjanco and @grunewas,

My name is Anisa and I’m Programming Historian’s Publishing Assistant.

I’ve copyedited your lesson, and my comments/suggested revisions are now ready for your review.

I have applied my suggested revisions directly to the markdown file in our Submissions Repository. You can view the individual additions and subtractions I’ve made in the Commit History.

I’m going to paste a list of my comments and suggested revisions below, to ensure that the copyediting process is transparent and can invite discussion.

I've formatted my comments/suggested revisions as a list of tasks. You will notice that many of these tasks are ‘checked’, because I’ve already made the changes. A small number remain ‘unchecked’ because they ask questions.

I hope you'll find these comments and suggestions useful. Please let me know if you'd like to talk through anything through. You're not obligated to take my suggestions on board. Read through and we can have a converstaion!

--

  1. Lesson Overview
  1. Historical Example
  1. Building a corpus
  1. Building a gazetteer
  1. Finding Places in Text with Python

Natural language processing

Load the gazetteer

Matching Place Names

Loading Text Files

Term Frequency

Named Entity Recognition

DisplaCy

Named Entity Linking

Export Our Data

  1. Uploading to the World Historical Gazetteer
  1. Future Mapping Work and Suggested Further Lessons
svmelton commented 2 years ago

Thanks @anisa-hawes—I'd advise removing this sentence: "Rather, Grunewald argues that the term “Siberia” served as a decorative term that framed POWs as victims who had endured harsh conditions and cruelty in an exoticized Soviet East." It's an interesting argument but we don't really have time to explore it in this piece. I think the rest of the section stands without it.

anisa-hawes commented 2 years ago

Thank you, @svmelton. This is useful. Please let us know how you feel about that, @apjanco and @grunewas.

I have also noted two typing errors I introduced in 2. Historical example. Sincere apologies, now corrected.

Just to reiterate that I'm very happy to talk through any of the suggested changes if you'd like to.

Very best wishes, Anisa

amsichani commented 2 years ago

Thanks @anisa-hawes for your great copy editing and @svmelton for your input! @grunewas and @apjanco, I checked the PR , it looks fine - here are just a couple of points I think you need to amend in this PR:

Happy to clarify if there is anything unclear. Many thanks all for the fantastic work!

anisa-hawes commented 2 years ago

Thank you for your reply, @amsichani. The final step for me will be to replace all the live web links with Perma.cc archived links.

Within my copy edits I made some suggestions for additional links, so I await confirmation that you are all happy with these.

apjanco commented 2 years ago

Everything seems good to @grunewas and I. I submitted a PR to fix the extra "is".

amsichani commented 2 years ago

Great @apjanco! I am approving the PRs now but there is a conflict on para 5 between the two files , not allowing for the PR to be deployed. Can you fix this or do you want me to do this? cheers!

apjanco commented 2 years ago

It's saying "no conflicts" to me. If you can fix it, that would be great. Thank you!

anisa-hawes commented 2 years ago

I've closed PR #451 to avoid conflicts with the base branch. Meanwhile, I have removed additional word "is" from para.52 in your first PR #448.

I have also adjusted the syntax of the link you've inserted at para.41 so that it now reads:

Rather, Grunewald argues that the term "Siberia" served as a term to emphasize suffering in Soviet captivity.

anisa-hawes commented 2 years ago

I'll wait for you to merge the PR, @amsichani.

Hello @apjanco and @grunewas, If there are any further changes needed, you can make direct edits to lesson files. There's no need to use the Git Pull Request system in the Submissions Repository.

Let me know if I can help with anything else. When you’re happy with the final version of the lesson, I’ll go ahead and replace all external links with perma.cc archival links.

Very best wishes, Anisa

apjanco commented 2 years ago

Looks good to us! Best, Andy

On Fri, Jan 28, 2022 at 2:42 PM Anisa Hawes @.***> wrote:

I'll wait for you to merge the PR, @amsichani https://github.com/amsichani.

Hello @apjanco https://github.com/apjanco and @grunewas https://github.com/grunewas, If there are any further changes needed, you can make direct edits to lesson files. There's no need to use the Git Pull Request system in the Submissions Repository.

Let me know if I can help with anything else. When you’re happy with the final version of the lesson, I’ll go ahead and replace all external links in the lesson with perma.cc archival links.

Very best wishes, Anisa

— Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/383#issuecomment-1024578722, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADWHMFMBMJTAJDVXQU74RILUYLWTXANCNFSM5ASQTUMA . You are receiving this because you were mentioned.Message ID: @.***>

anisa-hawes commented 2 years ago

Hello @amsichani, @apjanco and @grunewas,

In my latest commit I have done the following tasks:

*I have taken the decision not to replace two links in the lesson with their perma.cc links

  1. In para.167 https://www.openstreetmap.org/search?query=Gryazovets#map=12/58.8695/40.2395
  2. In para.332 https://spacy.io/universe/project/video-spacy-irl-entity-linking In these instances, I found that Perma.cc could not satisfactorily capture these pages (live street map, and YouTube embed)
amsichani commented 2 years ago

Many thanks for these final edits, @anisa-hawes !

I'm re-posting @svmelton the list of all the relevant files for reference:

images/finding-places-world-historical-gazetteer - images files assets/finding-places-world-historical-gazetteer - asset files lessons/finding-places-world-historical-gazetteer.md - the lesson file gallery/finding-places-world-historical-gazetteer.png - the modified avatar gallery/originals/finding-places-world-historical-gazetteer-original.png - the original avatar

Authors bio:

- name: Susan Grunewald
  team: false
  orcid: 0000-0003-1275-4101
  bio:
      en: |
          Susan Grunewald is the Digital History Postdoctoral Associate at the University of Pittsburgh World History 

- name: Andrew Janco
  team: false
  orcid: 0000-0002-8872-9474
  bio:
      en: |
          Andrew Janco is the Digital Scholarship Librarian at Haverford College.   

The lesson is now ready . many thanks to all for the collaboration!

svmelton commented 2 years ago

Thank you @amsichani! I'll get this up in the next couple of days.

anisa-hawes commented 2 years ago

Hello all,

Please note that this lesson's .md file has been moved to a new location within our Submissions Repository. It is now found here: https://github.com/programminghistorian/ph-submissions/tree/gh-pages/en/drafts/originals

A consequence is that this lesson's preview link has changed. It is now: http://programminghistorian.github.io/ph-submissions/en/drafts/originals/finding-places-world-historical-gazetteer

Please let me know if you encounter any difficulties or have any questions.

Very best, Anisa

anisa-hawes commented 2 years ago

Hello @apjanco and @grunewas,

I've just been doing a final check-through of your lesson.

There is one point I'm confused about, which I am afraid I somehow missed...

The information box at para. 399 includes the sentence "If you are building your own dataset, it is worth taking the time to add a country codes (ccodes) column into the file you upload as well as aat type with the corresponding type (e.g. settlement, state, country)."

The meaning of the second half of this sentence isn't clear to me, and I am not sure how it should read. Are you able to advise?

Did you intend "[...] as well as a type column with the corresponding type (e.g. settlement, state, country)"?

I can implement the change on your behalf – the .md file has already been moved over to our other repo.

Thank you!

All best, Anisa

grunewas commented 2 years ago

@anisa-hawes Good catch. I did indeed intend to write "[...] as well as a type column with the corresponding type (e.g. settlement, state, country)." Could you please change that? Thanks!

anisa-hawes commented 2 years ago

Thank you for the clarification @grunewas! I've made this change. Much appreciated.

svmelton commented 2 years ago

The lesson is now live! Thanks to @apjanco and @grunewas for your piece, @amsichani for serving as the editor, and @gadolou and @rkhatib for peer reviewing the piece.

@amsichani, this should be ready to promote on the Twitter bot now.

Thanks to everyone for your work!

grunewas commented 4 months ago

@anisa-hawes I'm sorry if this isn't the correct way to address this issue with an older, already published lesson. In the process of discussing a different lesson in the review process, I discovered that Wikipedia links were added to this lesson. I spoke with @apjanco and he was also unaware that these links to Wikipedia had been added to this lesson at some point in the publication process. I would like to request to have the Wikipedia links replaced with other ones that are more in line with publication expectations for my current institution and discipline of history. Andy supports these changes and this reasoning.

Could the following terms have the Wikipedia links swapped with the suggested ones:

anisa-hawes commented 4 months ago

Thank you, @grunewas. I will prepare these edits and let you both know when they are complete.

anisa-hawes commented 4 months ago

Hello @grunewas and @apjanco,

This links have been replaced in https://github.com/programminghistorian/jekyll/pull/3214, and the changes are now reflected on our live site.

All best wishes, Anisa

grunewas commented 4 months ago

Thanks @anisa!

On Mon, Mar 18, 2024, 5:22 PM Anisa Hawes @.***> wrote:

Hello @grunewas https://github.com/grunewas and @apjanco https://github.com/apjanco,

This links have been replaced in programminghistorian/jekyll#3214 https://github.com/programminghistorian/jekyll/pull/3214, and the changes are now reflected on our live site.

All best wishes, Anisa

— Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/383#issuecomment-2005149596, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJTTIJ7HTNWJI2ZTJLWYH3YY5SI5AVCNFSM5ASQTUMKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBQGUYTIOJVHE3A . You are receiving this because you were mentioned.Message ID: @.***>