Finding places in text with the World Historical Gazeteer

amsichani commented 3 years ago

The Programming Historian has received the following tutorial on 'Finding places in text with the World Historical Gazeteer' by @apjanco and @grunewas. This lesson is now under review and can be read at:

http://programminghistorian.github.io/ph-submissions/en/drafts/originals/finding-places-world-historical-gazetteer

Please feel free to use the line numbers provided on the preview if that helps with anchoring your comments, although you can structure your review as you see fit.

I will act as editor for the review process. My role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum. I will read through the lesson and provide initial feedback yp authors.

Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.

I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me.

Our dedicated Ombudsperson is (Ian Milligan - http://programminghistorian.org/en/project-team). Please feel free to contact him at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudsperson will have no impact on the outcome of any peer review.

Anti-Harassment Policy

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

The Programming Historian is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutinize ideas, to ask questions, make suggestions, or to requests for clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. Thank you for helping us to create a safe space.

[Permission to Publish]

The editor must also ensure that the author or translator post the following statement to the Submission ticket.

I the author|translator hereby grant a non-exclusive license to ProgHist Ltd to allow The Programming Historian English|en français|en español to publish the tutorial in this ticket (including abstract, tables, figures, data, and supplemental material) under a CC-BY license.

amsichani commented 2 years ago

Given that's mid-August and not much is expecting to happen in the next couple of weeks project-wise, I just want to update @apjanco and @grunewas that we are excited to have @gadolou as a reviewer for this lesson -- looking forward to her review ! I will update soon on the name of the second reviewer and I will also add my initial comments here . Cheers!

gadolou commented 2 years ago

Thank you @amsichani for providing the opportunity to contribute with my review to the preparation of this tutorial. And thank you @apjanco and @grunewas for this tutorial! It is quite interesting and people dealing with historical texts and place names will find it helpful. The tutorial is constructed in two parts clearly defined, NER and the World Historical Gazetteer datasets building, that can also act as stand-alones. For the first part, readers are expected to be more familiar with programming languages but provided references to specific tutorials are helpful. Code excerpts and images are also available and address almost every step of the procedure described which is good. Test data are provided (however the link in paragraph §7 doesn’t load). As a general comment and to address users not familiar with programming, I would add more detailed explanation in some paragraphs that present the code. Also, I would explain more about how reconciliation of WHG works and what prerequisites are needed to work succesfully. In more detail, I have the following suggestions: §1: “First, we produce…” Someone would expect what the second thing would be. Also, a more clearly defined goal of the lesson is needed here.

§3: “These spatial steps…” I would rephrase this sentence, replacing historical maps with maps depicting historical information or entities etc. The term historical maps means the old maps. Also, in this paragraph, I would remove spatial analysis since it is not implemented in any part of the procedure described in the tutorial.

§7: the link in paragraph doesn’t load

§11: perhaps a note for the readers is needed here that the WHG is more successful in matching old and modern names if the official datasets store this kind of information (I mean there are old place names that are not supported either because they are not included in the basic datasets of WHG as matches to modern ones or their locations are unknown). Also, a note whether WHG can perform location (coordinates) matching?

§12: either “currently” or “point of time” can be removed.

§20: Before loading the language in §21, an “import spacy” is needed, it would be helpful to add it after “pip install spacy”.

§25: more detailed explanation about how coding is applied.

§37: “it will thie…” rephrase

§45: a reminder to the readers how/where “pip install…” should be applied.

§46: “we now entites…”: we now have entities? Also, “DBpedia”.

§50: “By using the WHG…”: not always (see comment in §11).

§69: “Go ahead…” : Go ahead and finish?

§75: Figure 12 is not loaded (at least for me 😊)

§76: ”…but it was”: “…but it will also…” And “in th bottom” – in the bottom.

Thank you for proposing the tutorial!

amsichani commented 2 years ago

fantastic @gadolou , thanks for the useful review. Happy to inform you that the second reviewer for this lesson will be @rkhatib - looking forward to her review too! @apjanco and @grunewas I d suggest to wait for all the reviewers' comments before start updating / reviewing your lesson ;-) cheers!

rkhatib commented 2 years ago

A huge thanks @amsichani for inviting me to review this wonderful tutorial; it’s been lovely to see the PH review process in action! Thank you @apjanco and @grunewas for putting together this particularly useful tutorial that will surely help many researchers carry out spatial analysis by extracting, matching, enriching, and geovisualizing place names in texts.

The way the lesson is divided into two parts is useful and offers multiple points of entry depending on the data users have. Another strong positive for me is the clear articulation of a very interesting research topic that is tailored for spatial analysis and exploration. Overall, the way the lesson is structured is clear to follow, with many helpful explanations of the functions, and accompanying images and links throughout.

At the outset, a clearer explanation of the context and goals of this lesson, as well as a more detailed account of what is covered in the tutorial and how it can be useful in research would be beneficial (e.g. DBpedia and reconciliation). In the spirit of multilingual DH, it could be interesting to point to places where other sample spatial datasets in different languages may be found, such as Zenodo.

Here are some more detailed suggestions:

§1 and §3 could be combined to help define the goals of the workshop, with a broad explanation of what the steps for spatial analysis are and how they correspond to the two parts of the lesson.

§1: First, we produce… It would be useful at the outset here to provide more information about what TSV and HTML files are in this context and what types of interpretations they enable, especially for a beginner audience who is trying to understand what methods would work best for their research goals.

§2 Finally… Can you please elaborate on what the “Linked Places Format” and “reconciliation” and “geocoding” are, in layman’s terms?

§5 and §6 Very interesting research topic!

§7 Sample text corpus unavailable

§10 …represent the “contemporary” Soviet names Consider removing “contemporary” since contemporary place names are the current Russified versions

§11 In the opening sentence: The gazetteer from this lesson is also an example.. Consider rephrasing the last sentence or separating it into two sentences for clarity

§14-§24 Appreciate the clear explanation of text processing and NLP in Python

§18 ..spaCy works extremely well for tasks Consider providing acronym for part of speech tagging (POS) if giving one for NER, NLP, etc for consistency

§26 It might be helpful to provide some examples for how the “matcher” function can be used in research

§36 You will encounter machine errors, so it’s important to review the results and to correct errors. How does one go about this?

§37 Not sure what is meant by “assess predictions”

§46 DBpedia section is so interesting! Note that I have replaced… please watch acronym capitalization in this paragraph.

§48 This is a more involved process than we can detail here. Fascinating to learn about the level of disambiguation that spaCy can support in predicting entities

Uploading to the World Historical Gazetteer I really enjoyed this section, very well written and useful for research

§50 Might be helpful to define “geocoding” here if you don’t do so in the introduction (see §2)

§62 ..your place upload Please reword for clarity.

§67 of the ”Reconciliation Review” screen.

§70 ..taken to complete the review of..

§76 In the bottom..

Thank you for putting together this useful tutorial!

amsichani commented 2 years ago

Many thanks @rkhatib for the really detailed review! @apjanco and @grunewas given that we now have two very detailed reviews in place, I 'd suggest to go ahead and address the issues raised by @rkhatib and @gadolou instead of waiting for me to synthesise a commentary-- how 30 November sounds as a deadline for an updated version of the lesson? In the meantime, I will make sure there is no pending tech gremlins in our end, based also in the reviewers points, eg with embedded images. My personal goal is to try and publish this lesson well before Christmas holidays! Thanks !

amsichani commented 2 years ago

Hi @apjanco and @grunewas , thanks for addressing the reviewers' comments- I think the lesson has been improved and is really such a great resource now -- many thanks for this! I will now going to have another final closer look at this and try to fix some persistent errors (eg Figure 12 is still not loaded at my end) and once everything is ready, I will then start the publication process. Nearly there! Many thanks again for the collaboration!

svmelton commented 2 years ago

Hi @anisa-hawes! We have a lesson ready for copyediting whenever you're available. (I know we're on PH break at the moment!)

anisa-hawes commented 2 years ago

Thank you, @svmelton! I will add this to my to-do list for the new year!

anisa-hawes commented 2 years ago

Hello @amsichani, Hello @apjanco and @grunewas,

My name is Anisa and I’m Programming Historian’s Publishing Assistant.

I’ve copyedited your lesson, and my comments/suggested revisions are now ready for your review.

I have applied my suggested revisions directly to the markdown file in our Submissions Repository. You can view the individual additions and subtractions I’ve made in the Commit History.

I’m going to paste a list of my comments and suggested revisions below, to ensure that the copyediting process is transparent and can invite discussion.

I've formatted my comments/suggested revisions as a list of tasks. You will notice that many of these tasks are ‘checked’, because I’ve already made the changes. A small number remain ‘unchecked’ because they ask questions.

I hope you'll find these comments and suggestions useful. Please let me know if you'd like to talk through anything through. You're not obligated to take my suggestions on board. Read through and we can have a converstaion!

--

Lesson Overview

[x] Suggested revisions: To begin, we will produce a tab-separated value (TSV) file with a row for each occurrence of the term and an HTML file of the text with the terms highlighted. [add 'will'. add link to define 'tab-separated value']
[ ] Q: each occurrence of each term?
[ ] Q: all terms highlighted?
[x] Suggested revision: A visualization can be used to interpret the results and to assess their usefulness for a given project. ['A' instead of 'this', because the visualisation form has not been established yet]
[x] The goal of the lesson is to systematically search a text corpus for place names and then to use a service to locate and map historic place names. [delete extra space between 'to' and 'systematically']
[x] Suggested revision: This lesson will be useful for anyone wishing to perform named entity recognition (NER) on a text corpus. [add link]
[x] Suggested revision: Other users may wish to skip the text extraction portion of this lesson and focus solely on the spatial elements of the lesson, that is gazetteer building and using the World Historical Gazetteer (WHG). [add link and abbreviation 'WHG' at first mention]
[x] Suggested revisions: We urge you to try both parts of the lesson together if you have time, as this will enable you to learn how text analysis and mapping can be combined in one project. Additionally, it will demonstrate how the results of these two activities can be ported into another form of digital analysis. [as this will enable you to learn / how text analysis and mapping can be combined / Additionally, it will demonstrate how the results of these two activities]
[x] Suggested revision: In this lesson, readers will first use the Python pathlib library to load a directory of text files. [add 'first'. add link to define 'pathlib']
[x] Suggested revision: For those with an existing gazetteer or list of terms, readers will then be able to create a list of term matches and their locations in a text. [add link to define 'gazetteer'. will then be able to create]
[x] Suggested revision: Those without a gazetteer, can use a statistical language model to identify places. [avoid repetition in 'users can use']
[x] Suggested revision: Finally, users will create a TSV file in the Linked Places Format, which can then be uploaded to the World Historical Gazetteer. [add link to define 'Linked Places Format'. remove link to World Historical Gazetteer here, use upon first mention above. Note that this is a https link, not http. Remove hyphen between world and historical]
[ ] Comment: 'Finally' reads strangely because it doesn’t follow a sequence. I have added in first, and then be able to to make this flow.
[x] Suggested revision: Linked Places Format is designed to standardize descriptions of places and is primarily intended to link gazetteer datasets.
[x] Add link: It is a form of Linked Open Data (LOD), which attempts to make datasets interoperable.
[x] Suggested revision: geocode, or ‘match’,
[x] Suggested revision: plots on a map.
[x] Suggested revision: multilingual digital humanities.
[x] Suggested revision: , while the list of place names represent [...] 1941-56.

Historical Example

[x] Suggested revision: This lesson applies co-author Susan Grunewald’s research and serves to demonstrate how her practical methods can be beneficial to historians.
[x] Suggested revision: Grunewald mapped forced labor camps where German Prisoners of War (POWs) were held in the former Soviet Union during and after the Second World War.
[x] Her maps have been used to argue_ that contrary to popular memory, German POWs were more commonly sent to industrial and reconstruction projects in Western Russia than to Siberia.
[x] Grunewald’s research went onto investigate whether POW memoirs gave a misrepresentation
[x] In this lesson, we will use a list of POW camp names to identify mentions of particular camps and their locations in POW memoirs.
[x] This data can then be mapped to demonstrate Grunewald’s thesis that not all direct mentions of places featured in POW memoirs were in Siberia.
[x] Rather, Grunewald argues that the term “Siberia” served as a
[ ] Comment: I feel uncomfortable about the end of this sentence, because I wonder if it might offend some of our readers. I would like to ask @svmelton for advice here. While I think this paragraph is interesting because we learn how the methodology has been applied in the authors' research and helped them to develop particular hypotheses, I also feel it's important to be clear about the distinction between a hypothesis and an accepted history. I have made adjustments (above) to clarify this, but I find the following section particularly difficult: "decorative term that framed POWs as victims who had endured harsh conditions and cruelty in an exoticized Soviet East".

Building a corpus

[x] Suggested revision: To facilitate this lesson, we have compiled a sample dataset of selections from digitized POW memoirs to search for place names.
[x] Suggested revisions: Due to copyright restrictions, the dataset does not include the full text of the memoirs but rather represents snippets from roughly 35 sources.
[x] Suggested revision: Alternatively, to build your own corpus for this exercise, all you need are text files saved in .txt format.
[x] Suggested revision: If you need help building a corpus of .txt files, you may find this Programming Historian tutorial by Andrew Akhlaghi useful — it provides instructions for transforming PDF files into machine readable text files.

Building a gazetteer

[x] Suggested revision: A gazetteer is an index or directory of place names.
[x] Suggested revision: For our example, we are using information from an encyclopedia of German prisoner of war camps catalogued in central Soviet government documents.
[x] Suggested revision: comma after e.g.,
[x] Suggested revision: involved when studying Soviet history.
[x] Suggested revisions: This means that the places listed may not be named as we know them today. Many are Russified versions of place names that are now more commonly known by different names due to post-Soviet identity politics.
[x] Suggested revisions: Some of the places do still have the same name, but an added complexity is that the German transliteration we read here, is from a Russian transliteration of a local language, such as Armenian.
[x] Suggested revision: New sentence. As we proceed with the mapping process, it is important we remain aware of these extra layers of possible distortion.
[x] Suggested revision: The directory of place names we use in this lesson is an example of a historical gazetteer.
[x] Suggested revision: Trying to map the places mentioned in the memoirs is therefore not a simple task as the names have changed both in the Soviet and post-Soviet era.
[x] Suggested revision: it is possible to use the
[x] Suggest: delete 'serve'
[x] Suggested revision: This gives us the ability to map the locations on a modern map.
[x] This process is not straightforward if you are working with larger, more common geographic information systems such as Google Maps or ArcGIS.
[x] Suggested revision: The WHG is not perfect, though. It works well for matching historical place names with their modern equivalent only when those links already exist in the gazetteer,_ which is continuously growing due to active user contributions.
[x] Suggested revision: _However, certain place names are not currently supported – sometimes because the links between the historical and contemporary names are not yet established, or because their locations are unknown.
[x] Suggested revision: Note that for this lesson, and whenever working with the World Historical Gazetteer, it is best to focus on the names of settlements (i.e., towns and cities). This is because the World Historical Gazetteer does not currently support mapping and geolocating of states or countries.
[x] Suggest delete: 'the rest of the steps of'

Finding Places in Text with Python

[x] Suggest delete: 'over'
[x] Suggest replace 'see' with 'notice'
[x] Suggest replace 'actually' with 'specifically'
[x] Suggested revision: -1, which means that the sequence “Rivers” (with an uppercase R) could not be found.
[x] You can also inadvertently match characters that are part of the sequence, but don’t represent a whole word.
[x] You retrieve 15 matches because that is part of of the “y riv” sequence. So while it is present in the text, this isn’t a term that you’d normally set out to find.

Natural language processing

[ ] Section numbering stops suddenly here... Should this be section 6? Below, we find section numbering again from 6. Uploading to the World Historical Gazetteer
[x] add link to define NLP
[x] Suggested revision: For example, it can identify if a word is a noun or a verb using ‘part of speech’ tagging.
[x] Suggested revision: We can also use NLP to identify the direct object of a verb, to determine who is speaking and locate the subject of that speech.
[x] NLP provides your programs with additional information which can open up new forms of analysis.
[x] Suggest using 'we' instead of 'I'. As historians, we can also appreciate how NLP prompts us to consider the linguistic aspects of our sources in ways that we might otherwise not.
[x] Our first NLP task is tokenization. This is where our text is split into meaningful parts, known as word tokens.
[x] closing punctuation
[x] For English and other languages that use spaces between words, you often achieve good results simply by splitting the tokens where there are spaces.
[x] However, more specific rules are needed to separate punctuation from a token, to split and normalize elided words (for example, “Let’s” > Let us) as well as other exceptions that don’t follow regular patterns.
[x] fast and simple so that it works well on a basic laptop.
[x] Suggested revision + link: As a library, spaCy is opinionated and its simplicity comes at the cost of choices being made on your behalf.
[x] Suggested revision: For those who are new to NLP, the spaCy documentation is a good place to learn about how their design choices influence the way the software operates, and will help you to assess whether spaCy is the best solution for your particular project.
[x] Suggest remove acronym NER here
[x] Suggested revision: you will be able to import the ‘object’ for your language so that the tokenization rules are specific to your language.
[x] Suggested revision: To load a new language, you need to import it. For example, from spacy.lang.de import German or from spacy.lang.en import English. In Python, this command navigates to spaCy’s directory,
[x] Suggested revision: its own index number.
[x] Suggested revision: remove ‘stop words’ (words to be filtered out before processing) and punctuation

Load the gazetteer

[x] file containing our list of names.
[x] we will import
[x] file is structured with a new line for each place name.

Matching Place Names

[x] There’s one place name in this sentence,
[x] which is a powerful tool for searching the tokenized text
[x] The _M_atcher will find
[x] to be written in all lower case letters so that the search will be case-insensitive.
[x] if you want to perform a case-sensitive search.
[x] we retrieve a list including exact matches as well as the start and end indexes of the matched spans or tokens.
[x] some of this may feel familiar.
[x] , here we’re matching token patterns that can also include parts of speech and other linguistic attributes of a text.
[x] that will identify a match whenever

Loading Text Files

[x] .txt files
[x] an easy way to iterate over all the files in the directory using iterdir()
[x] will generate a list of the

Term Frequency

[x] To count frequencies,

Named Entity Recognition

[x] _M_atcher
[x] It will find any of the places in our list that occur in the text.
[x] However, what if we want to find places including those not in our list? Can we retrieve all the place names that appear
[x] For this task, there are pre-trained models available in many languages for identifying place names. These are statistical models that have learned the general “look and feel” of place names and can make predictions.
[x] This enables the model to identify place names even if they were not included in its training data. But it also means that the model can make mistakes.
[x] However, it is also possible to fine-tune a model using your materials to improve accuracy.
[x] However, you are likely to encounter machine errors, so it’s important to review the results and to correct errors. With Matcher, you won’t encounter these mistakes, but you also won’t find place names that are not featured in the gazetteer.

DisplaCy

[x] Capitalise the C in displaCy
[x] Add link: a useful tool called displaCy.
[x] displaCy will display an image of the text alongside any predictions made, which can be very useful when assessing whether the results are going to be helpful to your research or introduce too many machine errors to be practical.
[x] that enables you to assess predictions quickly. Visualizations can be created either using a Python script or by running a
[ ] Figure numbering is wrong here: It reads "Figure 0"
[x] With statistical models, you can also use displaCy to create a useful visualization of the relationships between words in the text. Just use style='dep' to generate this form of visualization.
[x] DisplaCy visualizations can also be saved to a file for use elsewhere (spaCy's documentation provides further guidance)

Named Entity Linking

[x] to know who Karl-Heinz Quade is.
[x] For example, DBpedia records for places often contain data including the latitude and longitude, region, country, time zone, and population.
[x] There is a useful Python library for spaCy known as the DBpedia Spotlight which can attempt to match predicted entities with records in DBpedia.
[x] This relationship will then be visible as part of the ‘entity span’ annotation. To install the DBpedia Spotlight library,
[x] Note that we now have entities in the document
[x] Karl-Heinz Quade does not feature in
[x] To access the associated data, you can send a request to the DBpedia server. Note that we have replaced the human-readable page “resource” with the machine-readable operator “data”.
[x] We then add “.json” to the record name, which will return the data as JSON. We can use the requests library to parse the JSON data and make it ready for use in our Python script.
[x] You can explore this data using print(data) or data.keys().
[x] For more on JSON, see Matthew Lincoln’s lesson for the Programming Historian.
[x] Here is an example of how to access the latitude and longitude for this particular result:
[x] is similar to the Matcher.
[x] It can identify a match,
[x] Whereas, spaCy has the capacity
[x] This is a more involved process than we can detail within the scope of this lesson.

Export Our Data

[x] The final step in this section is to export our _M_atches
[x] then you will already be familiar with tabular data.
[x] This is information structured in rows and columns.
[x] These are simple text files including symbols that split the text into rows and columns. Rows are separated by the new line character \n, and then split into columns using tabs \t.

Uploading to the World Historical Gazetteer

[ ] Section numbering is suddenly re-introduced here...
[x] Link to WHG not required here. It is linked upon first mention in the lesson.
[x] The WHG is a fully web-based application. It indexes place names drawn from historical sources, adding temporal depth to a corpus of approximately 1.8 million modern records.
[x] This is especially useful for places which have changed names over time.
[x] (provided that the places are in the WHG index and have coordinates)
[x] services including Google Maps, as well as those behind a paywall barrier such as ArcGIS,
[x] historical place name information
[x] and add a brief description.
[x] to anyone but your own user account.
[x] If you want to contribute to the WHG, you can upload your historic place name information and adjust the privacy settings in the future.
[x] Next, browse your computer to locate your Linked Places TSV file and upload it. Do not change the formatting selection – Delimited/Spreadsheet must remain selected. The image below shows a properly formatted upload dialogue box.
[x] Back on the “Data” tab, click on the TSV file you’ve uploaded.
[ ] Q: “Data” tab? This is new — no mention of tabs before.
[x] This will take you to a new screen to preview the dataset’s metadata.
[ ] Comment: I note that instructions are much more interface specific in this section than elsewhere.
[x] Reconciliation is the process of linking entries in your TSV file to the database of place names and their additional relations in the WHG.
[x] you can create your own user area to fine-tune the results retrieved in the reconciliation process.
[x] When the reconciliation is complete, you will be advised how many results need to be reviewed for each ‘Pass’ of the process.
[x] In this case we reconciled 133 records, of which 126 were’ hits’. We now have to do a manual review of the ‘hits’ to see if they are correct Matches. Press the “Review” button next to “Pass 1” to begin.
[x] You will be taken to a new screen that asks you to match the place names in your TSV file with records in the WHG.
[x] Your options are “closeMatch” or “no match”.
[x] It is also possible that none of the suggested matches are correct, and in that case select “no match”.
[x] Given the bare-bones nature of this upload, it will be a little harder to reconcile these matches.
[x] All of the results should originate from the countries that made up the former Soviet Union
[x] which we note is a variant of
[x] because the suggested match is a place in Italy. As our data only concerns places in the former Soviet Union,
[x] realize you’ve made a mistake,
[x] As a hint, of the 25 places for
[ ] Q: Why is [Belarus] in square brackets?
[x] If you wish, you can continue going through the 101 locations for Pass 2. Should you complete the review process, you will produce a map that looks like the one below.
[ ] Q: Figure 12 does not show in the preview

Future Mapping Work and Suggested Further Lessons

[ ] Review section numbering throughout the lesson
[x] there is a box labelled “Downloads”
[x] We highly recommend you look at the additional Programming Historian mapping lessons, specifically Installing QGIS and Adding Layers as well as Creating Vector Layers in QGIS to see how you can use the results of this lesson to carry out further analysis.

svmelton commented 2 years ago

Thanks @anisa-hawes—I'd advise removing this sentence: "Rather, Grunewald argues that the term “Siberia” served as a decorative term that framed POWs as victims who had endured harsh conditions and cruelty in an exoticized Soviet East." It's an interesting argument but we don't really have time to explore it in this piece. I think the rest of the section stands without it.

anisa-hawes commented 2 years ago

Thank you, @svmelton. This is useful. Please let us know how you feel about that, @apjanco and @grunewas.

I have also noted two typing errors I introduced in 2. Historical example. Sincere apologies, now corrected.

Just to reiterate that I'm very happy to talk through any of the suggested changes if you'd like to.

Very best wishes, Anisa

amsichani commented 2 years ago

Thanks @anisa-hawes for your great copy editing and @svmelton for your input! @grunewas and @apjanco, I checked the PR , it looks fine - here are just a couple of points I think you need to amend in this PR:

[ ] para 5 : I see the suggestion you made is trying to address the implications both @anisa-hawes and @svmelton are commenting on. I am in favour of accepting it.
[ ] para 10, trying to map the places mentioned in the memoirs is therefore is not a simple task as the names have changed both in the Soviet and post-Soviet era. --> delete the extra 'is'
[ ] Figure 12 : hopefully it will load now after this small edit. Lets see -- everything crossed !
[ ] great catch on the figure numbering - as I noticed, you didn't change the filenaming in the /images . As far as I can see , there is no other implication, so it's fine. As a general rule, we would like the assets' file naming to mirror the figure/assets numbering in the lesson itself for sustainability purposes.
[ ] Section numbering should be fine now.

Happy to clarify if there is anything unclear. Many thanks all for the fantastic work!

anisa-hawes commented 2 years ago

Thank you for your reply, @amsichani. The final step for me will be to replace all the live web links with Perma.cc archived links.

Within my copy edits I made some suggestions for additional links, so I await confirmation that you are all happy with these.

apjanco commented 2 years ago

Everything seems good to @grunewas and I. I submitted a PR to fix the extra "is".

amsichani commented 2 years ago

Great @apjanco! I am approving the PRs now but there is a conflict on para 5 between the two files , not allowing for the PR to be deployed. Can you fix this or do you want me to do this? cheers!

apjanco commented 2 years ago

It's saying "no conflicts" to me. If you can fix it, that would be great. Thank you!

anisa-hawes commented 2 years ago

I've closed PR #451 to avoid conflicts with the base branch. Meanwhile, I have removed additional word "is" from para.52 in your first PR #448.

I have also adjusted the syntax of the link you've inserted at para.41 so that it now reads:

Rather, Grunewald argues that the term "Siberia" served as a term to emphasize suffering in Soviet captivity.

anisa-hawes commented 2 years ago

I'll wait for you to merge the PR, @amsichani.

Hello @apjanco and @grunewas, If there are any further changes needed, you can make direct edits to lesson files. There's no need to use the Git Pull Request system in the Submissions Repository.

Let me know if I can help with anything else. When you’re happy with the final version of the lesson, I’ll go ahead and replace all external links with perma.cc archival links.

Very best wishes, Anisa

apjanco commented 2 years ago

Looks good to us! Best, Andy

On Fri, Jan 28, 2022 at 2:42 PM Anisa Hawes @.***> wrote:

I'll wait for you to merge the PR, @amsichani https://github.com/amsichani.

Hello @apjanco https://github.com/apjanco and @grunewas https://github.com/grunewas, If there are any further changes needed, you can make direct edits to lesson files. There's no need to use the Git Pull Request system in the Submissions Repository.

Let me know if I can help with anything else. When you’re happy with the final version of the lesson, I’ll go ahead and replace all external links in the lesson with perma.cc archival links.

Very best wishes, Anisa

— Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/383#issuecomment-1024578722, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADWHMFMBMJTAJDVXQU74RILUYLWTXANCNFSM5ASQTUMA . You are receiving this because you were mentioned.Message ID: @.***>

anisa-hawes commented 2 years ago

Hello @amsichani, @apjanco and @grunewas,

In my latest commit I have done the following tasks:

replaced external links with perma.cc archived links*
suggested deletion of final sentence of para.41 and adjusted placement of link to article
corrected link in para.44 </assets/finding-places-world-historical-gazetteer/place_texts.txt>
removed second link to https://spacy.io/ para.112
adjusted ph lesson link in para.114 to doi https://doi.org/10.46430/phen0029
added a note in brackets to say that Ines Montani's spaCy course is (available in English and six other languages), para.138
adjusted ph lesson link at para.191 to doi https://doi.org/10.46430/phen0033
adjusted ph lesson link at para.262 to doi https://doi.org/10.46430/phen0087
adjusted ph lesson links at para.432 to dois https://doi.org/10.46430/phen0031 and https://doi.org/10.46430/phen0034

*I have taken the decision not to replace two links in the lesson with their perma.cc links

In para.167 https://www.openstreetmap.org/search?query=Gryazovets#map=12/58.8695/40.2395
In para.332 https://spacy.io/universe/project/video-spacy-irl-entity-linking In these instances, I found that Perma.cc could not satisfactorily capture these pages (live street map, and YouTube embed)

amsichani commented 2 years ago

Many thanks for these final edits, @anisa-hawes !

I'm re-posting @svmelton the list of all the relevant files for reference:

images/finding-places-world-historical-gazetteer - images files assets/finding-places-world-historical-gazetteer - asset files lessons/finding-places-world-historical-gazetteer.md - the lesson file gallery/finding-places-world-historical-gazetteer.png - the modified avatar gallery/originals/finding-places-world-historical-gazetteer-original.png - the original avatar

Authors bio:

- name: Susan Grunewald
  team: false
  orcid: 0000-0003-1275-4101
  bio:
      en: |
          Susan Grunewald is the Digital History Postdoctoral Associate at the University of Pittsburgh World History 

- name: Andrew Janco
  team: false
  orcid: 0000-0002-8872-9474
  bio:
      en: |
          Andrew Janco is the Digital Scholarship Librarian at Haverford College.

The lesson is now ready . many thanks to all for the collaboration!

svmelton commented 2 years ago

Thank you @amsichani! I'll get this up in the next couple of days.

anisa-hawes commented 2 years ago

Hello all,

Please note that this lesson's .md file has been moved to a new location within our Submissions Repository. It is now found here: https://github.com/programminghistorian/ph-submissions/tree/gh-pages/en/drafts/originals

A consequence is that this lesson's preview link has changed. It is now: http://programminghistorian.github.io/ph-submissions/en/drafts/originals/finding-places-world-historical-gazetteer

Please let me know if you encounter any difficulties or have any questions.

Very best, Anisa

anisa-hawes commented 2 years ago

Hello @apjanco and @grunewas,

I've just been doing a final check-through of your lesson.

There is one point I'm confused about, which I am afraid I somehow missed...

The information box at para. 399 includes the sentence "If you are building your own dataset, it is worth taking the time to add a country codes (ccodes) column into the file you upload as well as aat type with the corresponding type (e.g. settlement, state, country)."

The meaning of the second half of this sentence isn't clear to me, and I am not sure how it should read. Are you able to advise?

Did you intend "[...] as well as a type column with the corresponding type (e.g. settlement, state, country)"?

I can implement the change on your behalf – the .md file has already been moved over to our other repo.

Thank you!

All best, Anisa

grunewas commented 2 years ago

@anisa-hawes Good catch. I did indeed intend to write "[...] as well as a type column with the corresponding type (e.g. settlement, state, country)." Could you please change that? Thanks!

anisa-hawes commented 2 years ago

Thank you for the clarification @grunewas! I've made this change. Much appreciated.

svmelton commented 2 years ago

The lesson is now live! Thanks to @apjanco and @grunewas for your piece, @amsichani for serving as the editor, and @gadolou and @rkhatib for peer reviewing the piece.

@amsichani, this should be ready to promote on the Twitter bot now.

Thanks to everyone for your work!

grunewas commented 4 months ago

@anisa-hawes I'm sorry if this isn't the correct way to address this issue with an older, already published lesson. In the process of discussing a different lesson in the review process, I discovered that Wikipedia links were added to this lesson. I spoke with @apjanco and he was also unaware that these links to Wikipedia had been added to this lesson at some point in the publication process. I would like to request to have the Wikipedia links replaced with other ones that are more in line with publication expectations for my current institution and discipline of history. Andy supports these changes and this reasoning.

Could the following terms have the Wikipedia links swapped with the suggested ones:

[x] TSV - please switch to https://www.loc.gov/preservation/digital/formats/fdd/fdd000533.shtml
[x] Named Entity Recognition - https://www.ibm.com/topics/named-entity-recognition or https://docs.deeppavlov.ai/en/master/features/models/NER.html
[x] Pathlib - https://realpython.com/python-pathlib/
[x] Gazetteer - https://support.esri.com/en-us/gis-dictionary/gazetteer
[x] Natural Language Processign - https://www.nnlm.gov/guides/data-glossary/natural-language-processing
[x] Tokenization - https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html

anisa-hawes commented 4 months ago

Thank you, @grunewas. I will prepare these edits and let you both know when they are complete.

anisa-hawes commented 4 months ago

Hello @grunewas and @apjanco,

This links have been replaced in https://github.com/programminghistorian/jekyll/pull/3214, and the changes are now reflected on our live site.

All best wishes, Anisa

grunewas commented 4 months ago

Thanks @anisa!

On Mon, Mar 18, 2024, 5:22 PM Anisa Hawes @.***> wrote:

Hello @grunewas https://github.com/grunewas and @apjanco https://github.com/apjanco,

This links have been replaced in programminghistorian/jekyll#3214 https://github.com/programminghistorian/jekyll/pull/3214, and the changes are now reflected on our live site.

All best wishes, Anisa

— Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/383#issuecomment-2005149596, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJTTIJ7HTNWJI2ZTJLWYH3YY5SI5AVCNFSM5ASQTUMKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBQGUYTIOJVHE3A . You are receiving this because you were mentioned.Message ID: @.***>

programminghistorian / ph-submissions

Finding places in text with the World Historical Gazeteer #383

Anti-Harassment Policy