programminghistorian / ph-submissions

The repository and website hosting the peer review process for new Programming Historian lessons
http://programminghistorian.github.io/ph-submissions
138 stars 114 forks source link

Review Ticket for Geoparsing Text with the Edinburgh Geoparser Lesson #26

Closed ianmilligan1 closed 7 years ago

ianmilligan1 commented 8 years ago

The Programming Historian has received the following tutorial on 'Geoparsing Text with the Edinburgh Geoparser' by @bea-alex. This lesson is now under review and can be read at:

http://programminghistorian.github.io/ph-submissions/lessons/geoparser-lesson

I will act as editor for the review process. My role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum. I have already read through the lesson and provided feedback, to which the author has responded.

Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.

I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me. You can always turn to @acrymble if you feel there's a need for an ombudsperson to step in.

Anti-Harassment Policy

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

The Programming Historian is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutinize ideas, to ask questions, make suggestions, or to requests for clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. If anyone witnesses or feels they have been the victim of the above described activity, please contact our ombudsperson (Ian Milligan - http://programminghistorian.org/project-team). Thank you for helping us to create a safe space.

ianmilligan1 commented 8 years ago

Update: two peer reviews have been contacted and are interested in doing the reviews! They'll paste reviews here.

At this time, any other community comments, etc. are greatly appreciated.

wcaleb commented 8 years ago

The figures for this lesson will need to follow the new syntax outlined here.

ianmilligan1 commented 8 years ago

sg @wcaleb – we'll make sure the figure syntax gets updated during the revisions process.

ianmilligan1 commented 8 years ago

Hi @bea-alex – I've received some initial feedback from one of the reviewers. I'll paraphrase below, but while the reviewer is enthusiastic about the goals of the lesson they had trouble getting things installed on a Mac.

So I think we'll have to think.. more explicit linking to our existing command line lesson? Maybe a quick and dirty walkthrough from a basic vanilla shell to getting it up and running. My vote is probably for the latter, with link to the former, but would love to hear your thoughts.

A quote:

If providing more instruction on using the directory and command line would bog down this lesson too much, then maybe a separate page/lesson should be created for those aspects, and this Geoparsing page could link people to that 'sub-lesson' to get what they need to know.

Anyways, what are your thoughts?

We might want to have a broader conversation at PH about the role of more advanced lessons, if you think it should be cast that way.

Thanks and looking forward to chatting!

acrymble commented 8 years ago

This looks like a very useful skill. It also looks like it covers some of the same goals as a lesson I wrote a couple of years ago: http://programminghistorian.org/lessons/extracting-keywords

I'm sure readers would benefit from understanding how these two lessons fit together so that they can make the right decision for their needs.

ianmilligan1 commented 8 years ago

@bea-alex Hi Bea – just pinging you. Any thoughts on the above comments?

aelang commented 8 years ago

Hi everyone. Bea, hope Finn is letting you sleep!

This is an excellent addition to the PH - very clear and with a lot of detail - and I'm glad to have the chance to review it. It is my first time reviewing for PH so forgive me if any of my comments are obvious or far from the mark. (And also if I screw up on Github, to which I'm relatively new.)

General comments

Need to signal at the start of the lesson that users are assumed to be familiar with the command line, and if they are not they should take the "Introduction to the Bash Command Line" tutorial first.

This might be an editorial matter but is it a problem that there is no guide for PC users?

I noticed that while all the way through there were instructions for Mac users, the options for Linux were only given some of the time. Is it worth signalling more explicitly that the tutorial is geared for Mac users, so those with Linux machines will need to [follow x, y, & z steps] to make the tutorial useable on a Linux machine? Sorry I can't make any concrete suggestions but I am not a Linux user!

Comments on specific paragraphs

Paras 1-3: These introductory paragraphs feel like they don't quite match the house style of PH, and are rather more in the register of a research paper than a tutorial. I'd suggest changing from "We optimised it for different domains as part of a series of research projects ... [then listing the research projects]" to something like, "People have used the Geoparser to identify the place names in a large set of literary texts set in Edinburgh and plot them on a map interface", etc. The terms geo-parse, geo-tag and geo-reference are terms of art and thus potentially unfamiliar for those new to this sort of work, so it might be worth either defining them, or using them in a context where their meaning can be inferred. Eg. at the first use of "geo-tag", add a parenthesis such as " -- supply a code which specifies a latitude and longitude --". Also, the idea of optimising a set of scripts for different domains is a concept easier to grasp with examples than in the abstract, and this would be a key place to signpost what is ahead in the tutorial. Eg, "In this tutorial, you will see how the Geoparser can be optimised for specific uses, for instance giving more weight to placenames in a particular area of the world." Although I'd be loathe to lose the references to the various research projects showing how the Geoparser has been used, eg. Trading Consequences & Hestia, they do get in the way of getting down to the brass tacks of the tutorial, so perhaps it is worth moving them to the end of the tutorial. Finally, I would suggest adding at the start of the introduction a few sentences in plain-ish English with examples of what users can use the Geoparser to do. Something as simple as "If you want to get a sense of the placenames mentioned in a historical document, the Geoparser will identify all the terms in the document that may refer to placenames, and provide its best guess as to where those places are." (That's clunky, but you get the idea - the tutorial needs to be pitched to people who may not - at least when they start reading the tutorial - intuitively understand the idea of geoparsing.)

para 6: New paragraph at 'Some machines will ...' . I also suggest changing this

Some machines will automatically decompress the .gz file and create the directory geoparser-v1.1. If not, then follow the next 3 steps. Otherwise move the geoparser-v1.1 directory to where you want it to be installed. Version 1.1 is the current release, this number will change in future.

to this:

Some machines will automatically decompress the .gz file and create the directory geoparser-v1.1. If this happens, and you see the geoparser-v1.1 directory appear, move this new directory to wherever you want it to be installed and go to step 10. If this does not happen, and your machine does not decompress the .gz file and create a new directory, follow steps 7, 8 and 9. (Note that version 1.1 is the current release, but this number will change in the future.)

Also, it may be worth giving the command line syntax for how to move the geoparser-v1.1 directory to a new directory.

Switch paras 7 and 8 (& then do the requisite editing) so that users create a directory first and then move the file into it.

Para 9: I wouldn't offer the option of clicking and dragging within the GUI here. I would instead set out the command one would type to simultaneously create the new directory and move the .gz file to it.

Para 10: With an eye to consistency with other PH tutorials, I would start by explaining why users need to change to the new directory. Eg. "We now need to change to the new directory so that the commands we type will apply to the files inside that directory. To do this, use the cd command as follows:" and give the command.

Para 11: tar -xvf geoparser-march2016.tar should be tar -xvf geoparser-march2016.tar.gz

Para 13: I wondered whether this screenshot was necessary, as PH is so (laudably) committed to moving people away from the GUI to the command line. This is perhaps best decided by an editor, but at the least I'd suggest swapping paras 13 and 14 so that the user is asked to check the text output of the bash shell, and then given the option of checking this in a Finder window.

Para 14: "binaries" is a term of art that needs to be explained or reworded. It also appears in para 67.

Para 15: again with an eye to PH house style, what about "Congratulations! You have successfully downloaded and set up the Geoparser, and you can now begin [etc]".

[To be continued: I'll split my review between several comment boxes]

aelang commented 8 years ago

Review continued: Comments on the Geoparsing A Text File section.

Again, this is all admirably clear. The register discrepancy is still present, but should be easily fixed by copy-editing. Eg. change the text of para 16 to this:

In this section, you'll learn how to geo-parse a simple text file. Use the cd command to go to the geoparser's script directory:

Para 17: Worth adding a quick explanation of what the 172172.txt file is, in part because step 27 then becomes clearer.

Paras 18-23: Glossing the command line syntax is very helpful in these paragraphs. I would be explicit that this is what you are doing, ie. start off with "Let's look at what each bit of the syntax is doing" (though word it less patronisingly than that ...). [Most PH-users who had gotten this far on this tutorial wouldn't need this to be made explicit, of course, but I'm thinking here of the folks I've spoken to who had no idea such a thing as the Geoparser existed, and were tremendously excited to hear about it, but who did not have sufficient technical knowledge to be comfortable on the command line.]

Para 24: Great paragraph for un-black-boxing the Geoparser. The terms tokenised, part-of-speech tagged and lemmatised need to be explained. Ditto named entity recognition. (Editors: is it equally acceptable to link to explanations of these on, say, Wikipedia?).

Para 26: I would reword to "Now you can see all the files that the Geoparser has produced from the original 172172.txt file. Use the ls command and the * wildcard operator to see all the files beginning with the prefix 172172, like so:" [and then give the command]

Para 27: "in standoff" and "surface form" are not terms I'm familiar with: could you explain/reword? (NB "standoff" also occurs in para 48).

I would also just delete "In some cases, it might even be better to reduce this parameter to less than 20 (e.g. see Alex et al., 2015). This is not a setting which can be changed on the command line. If there is sufficient interest, we can prepare a more advanced lesson on using the Geoparser which will explain how to change this and other more internal settings and features." You could also link to the URL for Alex et al., 2015 in the sentence before the ones just deleted (eg the words "does not increase performance considerably").

Para 30: I would push "You can also specify the option -top on the command line" into the following paragraph.

Para 31. Replace this

so only the green geo-coordinate pairs and pins (see Figure 3).

with this

so only the green geo-coordinate pairs and pins are displayed (see Figure 3).

aelang commented 8 years ago

Review continued: Other Useful Options for running the Geoparser section

Reviewers are asked to consider the length of the tutorial. This does feel like a longer PH tutorial than usual, and I began to wonder whether it would be better split into two, calling the first 32 sections 'Using the Edinburgh Geoparser: Basic functionality' and section 33 onwards 'The Edinburgh Geoparser: Additional options' or similar. The difficulty level of the two sections is about the same, so it would really only be for reasons for length. This is one I would leave up to the editors. (I've not been through every single one of the PH tutorials myself, so perhaps there are some longer ones, in which case this Geoparser tutorial could probably stay as is.)

General comments

In the bash shell tutorial, the authors use the nomenclature of 'flags', whereas this tutorial uses 'parameters' (and occasionally does not use any referring term). Using 'flag' might be better, just for consistency. This comment applies to the earlier sections as well.

There are a few inconsistencies with tense to be aware of Eg. in Para 41, "The score was set to 2" would be better as "Here, the score has been set to 2".

Para 33: Similar to my comments on the opening paragraph, I'd suggest a brief example here to make it clear why a user might want to use a bounding circle or a bounding box. Perhaps this is a good place to bring in a project such as Palimpsest (which I suggested that you either cut from the introduction or move to the end). The way Palimpsest makes use of the Edinburgh Geoparser makes it clear both why places near Edinburgh would be given greater weight, and the dangers in so doing (places that share a name with some location in Edinburgh but on closer inspection of the context turn out to be nothing to do with Edinburgh).

Para 44 "Apart from identifying locations" should be "As well as identifying locations". Also, the phrase "and normalises them" needs a bit more explanation. This could be done quite simply with a phrase like 'changing them from a phrase such as 'last Tuesday' or 'next Friday' to a specific date, month and year. (I realise the full explanation of temporal normalisation is given below, but I still think a quick gloss in this sentence would make the sentence more readily comprehensible.)

For the section headed Geoparsing Multiple Text Files (paras 51-64), I am very appreciative of the time you've taken to provide two ways in which users could quickly run multiple text files through the Geoparser. I think, though, that as useful a copy-and-paste resource as this is, it is somewhat at odds with the PH ethic of ensuring a user understands every single step. So I think the tutorial might be better off omitting it. So as not to waste the work, perhaps a version of it could be put on the Geoparser's own site and a line inserted into this PH tutorial to say 'If you want help processing multiple files, follow this link to find some scripts to help you do this.'

para 66: The text to this paragraph that introduces the Extracting Geo-Resolution Output to TSV section could be reframed a little. Instead of saying that some users might be unfamiliar with parsing XML files, say instead something like 'Rather than dealing with an XML file, you might find it easier to work with your georeferenced text in a form such as tab-separated values (TSV) in order to inspect it in a spreadsheet programme or use it with an application such as QGIS or Google Maps/Google Earth [and then link to the existing PH tutorials on QGIS/Google Maps/Google Earth].'

para 67: To add a little clarity, I would make the first sentence "The Geoparser is distributed with a very useful set of XML processing tools called LT-XML2, authored by Richard Tobin, which can extract all the location entities in a Geoparser XML output file and present them in TSV format." [the latter half of this sentence is taken from para 72, so you'll want to reword or perhaps even remove that from para 72.]

para 68: You could reference Melodee Beals's tutorial on Transforming Data for Reuse and Re-publication with XML and XSL here. Eg. 'Going in detail over Xpath is beyond the scope of this lesson, so I will give clear examples to show how things work. If you are interested you will find further detail in the Transforming Data for Reuse and Re-publication with XML and XSL tutorial.' Then move "The best tool for printing content of XML in a different format is lxprintf" to the start of para 69.

para 72: These are not as clear to non-programmers as they no doubt are to programmers. The quickest thing to do is to give you an example of how I'd rewrite them.

The lxprint command reads through a Geoparser XML output file, extracts all the location entities and presents them in TSV format. In the example above, the input file (from which the location entities are being extracted) is ./out/burtons.out.xml, and the tab-separated value file to which the location entities are written is ./out/burtons.out.tsv. The < symbol signifies "standard in" (or stdin), which tells the script to read in the file that follows it, while the > symbol signifies "standard out" (or stdout), which tells the script to send the output to the file that follows it.

That's rather clunky but will I hope give the general idea. Para 73 could also do with a little rewriting in the same vein.

para 74: 'argument' is used here and also in para 23, and needs to be glossed. (Actually, I see the term does not appear at all in the introduction to bash tutorial, so perhaps it's worth leaving out altogether, and substituting in something like 'the next part of the command'.)

Also in need of glossing is "if resolved". Something like "if this information exists in the file".

paras 77 & 78: It's brilliant having the script right there to be downloaded and used quickly. I wondered, again, if this was a little counter to the PH spirit of encouraging users to understand each incremental step, and thought it would probably be up to the editors to decide whether to keep these two paragraphs. If they are retained, the phrase "make it executable" and the chmod command that follows need glossing.

paras 79-81 (the online demo): This only works for me in Firefox and Safari and not in Chrome (OSX, using burtons.txt and geonames as the gazetteer). I wondered whether it needed to be included, given that it usually falls over when the text is anything other than very short. If so, it might be worth moving to the very top of the tutorial in order to give a quick and visually appealing demonstration of what the geoparser can do. Then, it's made clear that going under the hood is the way to use it on long files/multiple files, and that it can be configured in specific ways, and the remainder of the tutorial takes the user under the hood.

paras 82-91 (the FAQ): I very much like the addition of these frequently asked questions, and think they add a significant element to the user's understanding of the tool. I don't know how well they gel with the PH house style - I don't remember having seen an FAQ on any of the other tutorials I've been through over the years - but even if they do clash with the house style, I'd suggest keeping them.

Right, that's my review done. Thank you for a very detailed, very in-depth tutorial to a wonderfully useful tool. I hope these comments are helpful and look forward to seeing the final version on the site!

ianmilligan1 commented 7 years ago

Hi all! Bea has pushed some updates in 5e38377feef1205b242e7086fcf2dbdab44eefd8.

As an update to those observers, Bea was on maternity leave, then I went (well, still am) on paternity leave. But we are back on track. 😄

We will probably send to one additional reviewer to make sure it all works (in particular the installation process).

ianmilligan1 commented 7 years ago

Just a quick note that we have a second reviewer lined up, who will have a review in the next four or five weeks. 😄

Thanks to all! And @aelang thanks of course both for the original comments, as well as the commit-level ones you've made.

ghost commented 7 years ago

Thanks so much for the opportunity to try out the Edinburgh Geoparser! It was exciting to open the sample file and to see the mapped locations from the text.

I see that @bea-alex has already incorporated several comments from @aelang, so I won't repeat those. I have a few general comments on the writing of the lesson itself, and one technical issue to report on for now. I'll be back with further comments if I can resolve the error messages.

General comments

I used a late 2012 iMac running Sierra to go through this lesson.

The lesson sometimes uses "we" to refer to the Geoparser team (understandable, as the author is on the team), but I'm not certain if this is consistent with other PH lessons. My recommendation is to switch the lesson itself to the third person and to leave the "Credits and Citation" portion in the first person.

Most of my suggestions here are for consistency's sake. I'd recommend running a "Find and Replace" on words like "geo-" and then deciding on a convention (e.g. geo-parsing or geoparsing).

Para 1. Consider splitting the second sentence in two for flow. For example: "The Geoparser allows you to process a piece of text and extract and geotag the locations contained within it. Among other uses, geotagging the locations makes it possible to map the data."

Para 3. I would replace "our publications" with the more general "publications" (see note above).

Para 7. Apple is now stylizing Mac OS X as macOS for whatever reason. Will the Geoparser work on any version of macOS 10?

Para 11. Replace "follow steps 1 first" with "follow step 1 first".

Para 20. Replace "Geoparsing A Text File" with "Geoparsing a Text File".

Para 39. Replace "Other Useful Options for running the Geoparser" with "Other Useful Options for Running the Geoparser".

Para 62. Replace "manipulication" with "manipulation".

Para 84. Replace "suggestion" with "suggestions".

Technical issues

I tried running the script in paragraph 21 with the default directory structure and got the following output:

Systems-iMac:Software ssimpkin$ cd ./geoparser-v1.1/scripts
Systems-iMac:scripts ssimpkin$ cat ../in/172172.txt | ./run -t plain -g geonames -o ../out 172172
unrecognised platform Darwin 16.4.0 x86_64
edit scripts/setup, or set LXPATH to appropriate path

I'm not a bash expert, and am not sure what to modify in the setup script. The initial command cat ../in/172172.txt worked, and printed the file to my screen, but I can't get both to work together.

The output files in ../out that match 172172* were all dated 16 March 2016. They don't appear to have been updated when I tried running the command.

Trying other commands (for example, paragraph 47) also results in the same error.

If there's an easy fix, I'd be happy to give it a try and go through the steps again.

bea-alex commented 7 years ago

Hi Sarah,

thanks very much for your review. I made most of the changes you listed already. I sent you an email with a fix for the Geoparser and will work in parallel on creating a new release to fix it for everyone else.

We haven't been very consistent on the use of the hyphen after geo. I'll try to make it as consistent as possible in this lesson.

Do you know how you'd refer to all OS X versions together in one expression? So 10-16? The Geoparser should work for all of them (after I applied the fix)... at the moment it works for 10-15.

ghost commented 7 years ago

Thanks for your quick reply, @bea-alex! I'll give your suggested fix a try this week and report back.

10-16 makes sense to me!

ghost commented 7 years ago

Hi Bea,

Thanks again for the fix. I was able to make the change easily and the scripts ran without a hitch.

Is there a particular reason why the TSV example switches to a different dataset? I'd recommend using the 172172 file again for this example if possible, since readers have an idea of what to expect for those results already. I'd also suggest adding a more explicit step about opening the TSV file (showing a screenshot of the results in either Notepad/Wordpad or spreadsheet software), making it clear how the resulting dataset is structured.

I think that's all for me! Please let me know if you have any additional questions and thanks again for contributing this lesson.

ianmilligan1 commented 7 years ago

Wonderful, thanks so much @ssimpkin – @bea-alex, at this stage, I'll go through this week and provide the next editorial pathway forward. Again, really appreciate all of the engagement on this.

ianmilligan1 commented 7 years ago

Hi @bea-alex – I started going through this and ran into the same error as Sarah.

unrecognised platform Darwin 16.7.0 x86_64
edit scripts/setup, or set LXPATH to appropriate path

What do you think.. should we hold off until the new release is ready, or should we provide the patch changes for the first draft and remove once it works?

Otherwise, I made a few tiny edits which I pushed up up until that point (I stopped on the first test parsing run). My other two thoughts are:

Anyways, I stopped at this point as I'll wait to hear your thoughts on the unrecognised platform Darwin 16.7.0 x86_64 fix for the lesson. 😄

bea-alex commented 7 years ago

Hi Ian,

For the fix, if you open the setup file in the scripts directory and replace the following line:

Darwin?1[012345]*)

with

Darwin?1[0123456]*)

So, just add the 6. That should fix it for your version of MacOS. We are a bit behind the Apple releases. We wanted to wait for the new version of Apple to come out and test the Geoparser on that too and then make changes for High Sierra and Sierra at the same time before we push out the next release. High Sierra has a new file system but hopefully everything will still work but it’s probably best for me to add a note about the fix temporarily. Can we update a lesson once it’s published?

I’ll make the small changes Sarah suggested, thanks again for your review.

Re CLI instructions I’m torn… if it was me I’d prefer them as a the main installation instructions but then I’m probably not representative of your readership… what do you think Ian?

Bea

On 18 Sep 2017, at 21:03, Ian Milligan notifications@github.com wrote:

Hi @bea-alex https://github.com/bea-alex – I started going through this and ran into the same error as Sarah.

unrecognised platform Darwin 16.7.0 x86_64 edit scripts/setup, or set LXPATH to appropriate path What do you think.. should we hold off until the new release is ready, or should we provide the patch changes for the first draft and remove once it works?

Otherwise, I made a few tiny edits which I pushed up up until that point (I stopped on the first test parsing run). My other two thoughts are:

I wonder if we could add a paragraph in the introduction relating this to Adam's Gazeteer lesson https://programminghistorian.org/lessons/extracting-keywords. To me the difference is that in the former you provide a Gazeteer, whereas here you're relying on Edinburgh geoparser. But thoughts appreciated. For install instructions, I wonder if the CLI ones can be more of an addendum rather than the main route.. for GUI users it might be easiest to just download it, double-click the package, and then visit in their terminal. Anyways, I stopped at this point as I'll wait to hear your thoughts on the unrecognised platform Darwin 16.7.0 x86_64 fix for the lesson. 😄

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/26#issuecomment-330339817, or mute the thread https://github.com/notifications/unsubscribe-auth/AGK8HzOOn2J9AwBH2oyqTBk7moqM0yOHks5sjsyYgaJpZM4JFPdf.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

bea-alex commented 7 years ago

Oh and will add the link to the gazetteer lesson as well.

Bea

On 20 Sep 2017, at 10:58, Beatrice Alex balex@staffmail.ed.ac.uk wrote:

Hi Ian,

For the fix, if you open the setup file in the scripts directory and replace the following line:

Darwin?1[012345]*)

with

Darwin?1[0123456]*)

So, just add the 6. That should fix it for your version of MacOS. We are a bit behind the Apple releases. We wanted to wait for the new version of Apple to come out and test the Geoparser on that too and then make changes for High Sierra and Sierra at the same time before we push out the next release. High Sierra has a new file system but hopefully everything will still work but it’s probably best for me to add a note about the fix temporarily. Can we update a lesson once it’s published?

I’ll make the small changes Sarah suggested, thanks again for your review.

Re CLI instructions I’m torn… if it was me I’d prefer them as a the main installation instructions but then I’m probably not representative of your readership… what do you think Ian?

Bea

On 18 Sep 2017, at 21:03, Ian Milligan <notifications@github.com mailto:notifications@github.com> wrote:

Hi @bea-alex https://github.com/bea-alex – I started going through this and ran into the same error as Sarah.

unrecognised platform Darwin 16.7.0 x86_64 edit scripts/setup, or set LXPATH to appropriate path What do you think.. should we hold off until the new release is ready, or should we provide the patch changes for the first draft and remove once it works?

Otherwise, I made a few tiny edits which I pushed up up until that point (I stopped on the first test parsing run). My other two thoughts are:

I wonder if we could add a paragraph in the introduction relating this to Adam's Gazeteer lesson https://programminghistorian.org/lessons/extracting-keywords. To me the difference is that in the former you provide a Gazeteer, whereas here you're relying on Edinburgh geoparser. But thoughts appreciated. For install instructions, I wonder if the CLI ones can be more of an addendum rather than the main route.. for GUI users it might be easiest to just download it, double-click the package, and then visit in their terminal. Anyways, I stopped at this point as I'll wait to hear your thoughts on the unrecognised platform Darwin 16.7.0 x86_64 fix for the lesson. 😄

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/26#issuecomment-330339817, or mute the thread https://github.com/notifications/unsubscribe-auth/AGK8HzOOn2J9AwBH2oyqTBk7moqM0yOHks5sjsyYgaJpZM4JFPdf.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

bea-alex commented 7 years ago

I made all the changes and updated the repository.

Cheers,

Bea

On 20 Sep 2017, at 10:59, Beatrice Alex balex@staffmail.ed.ac.uk wrote:

Oh and will add the link to the gazetteer lesson as well.

Bea

On 20 Sep 2017, at 10:58, Beatrice Alex <balex@staffmail.ed.ac.uk mailto:balex@staffmail.ed.ac.uk> wrote:

Hi Ian,

For the fix, if you open the setup file in the scripts directory and replace the following line:

Darwin?1[012345]*)

with

Darwin?1[0123456]*)

So, just add the 6. That should fix it for your version of MacOS. We are a bit behind the Apple releases. We wanted to wait for the new version of Apple to come out and test the Geoparser on that too and then make changes for High Sierra and Sierra at the same time before we push out the next release. High Sierra has a new file system but hopefully everything will still work but it’s probably best for me to add a note about the fix temporarily. Can we update a lesson once it’s published?

I’ll make the small changes Sarah suggested, thanks again for your review.

Re CLI instructions I’m torn… if it was me I’d prefer them as a the main installation instructions but then I’m probably not representative of your readership… what do you think Ian?

Bea

On 18 Sep 2017, at 21:03, Ian Milligan <notifications@github.com mailto:notifications@github.com> wrote:

Hi @bea-alex https://github.com/bea-alex – I started going through this and ran into the same error as Sarah.

unrecognised platform Darwin 16.7.0 x86_64 edit scripts/setup, or set LXPATH to appropriate path What do you think.. should we hold off until the new release is ready, or should we provide the patch changes for the first draft and remove once it works?

Otherwise, I made a few tiny edits which I pushed up up until that point (I stopped on the first test parsing run). My other two thoughts are:

I wonder if we could add a paragraph in the introduction relating this to Adam's Gazeteer lesson https://programminghistorian.org/lessons/extracting-keywords. To me the difference is that in the former you provide a Gazeteer, whereas here you're relying on Edinburgh geoparser. But thoughts appreciated. For install instructions, I wonder if the CLI ones can be more of an addendum rather than the main route.. for GUI users it might be easiest to just download it, double-click the package, and then visit in their terminal. Anyways, I stopped at this point as I'll wait to hear your thoughts on the unrecognised platform Darwin 16.7.0 x86_64 fix for the lesson. 😄

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/26#issuecomment-330339817, or mute the thread https://github.com/notifications/unsubscribe-auth/AGK8HzOOn2J9AwBH2oyqTBk7moqM0yOHks5sjsyYgaJpZM4JFPdf.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

ianmilligan1 commented 7 years ago

Wonderful, thanks @bea-alex – I will continue my way through the lesson! Be in touch soon. 😄

ianmilligan1 commented 7 years ago

Thanks again @bea-alex. A few more thoughts on the lesson as we move into the final stages.

My first overall thought is that on our scale of beginner, to intermediate, to advanced lessons, this is probably an advanced one. But I think that's a good thing, as we need to cater to practitioners as well as those getting started. And yet another reason for people to learn how to use the command line!

A few quick thoughts:

My other substantive section is to try to incorporate the FAQ section into the text. I think our readers are pretty linear, so I'd worry that they wouldn't realize the patch fixes and other suggestions until after they've gone through – and that they might abandon the lesson early. Could it be incorporated into the text? For example, when the Darwin error might arise, mention the solution there; same with the historical place names, etc. At the very least, we'd need some internal links to anchor readers. Does that make sense? What do you think?

Finally, in response to your suggestions, yes we do lesson updates – and I think in this case, with a new MacOS release right around the corner, we'd want to make sure it works for that. It lands next week right? I usually update day of (living dangerously, right?) so we could always run a test on my machine first thing.

So at this stage I'll leave you to implementing Sarah's suggested revisions (I think you've already incorporated Anouk's), and responding to these comments as well. Thanks as always for your time and effort, this is great stuff. As I was playing with it, I was impressed at how nice and quick it all came up (see screenshot below).

screen shot 2017-09-20 at 3 12 50 pm

Talk again soon.

bea-alex commented 7 years ago

Hi Ian,

see comments inline.

My first overall thought is that on our scale of beginner, to intermediate, to advanced lessons, this is probably an advanced one. But I think that's a good thing, as we need to cater to practitioners as well as those getting started. And yet another reason for people to learn how to use the command line!

Fine by me.

A few quick thoughts:

I also use CLI installations, but I wonder if it's worth just noting that you can download and double-click the tar to open it (on MacOS anyways). I added another subsection and a couple of screenshots. For bounding boxes, like the one you used around Canada: is there a good web tool where people can get the lat longs for such tools? i.e. if somebody wanted to grab the bounding box around say Prince Edward Island, how would they do so? And, possibly stupid question, are bounding boxes limited to just being boxes or can they be complex polygons?

Yes. BoundingBox. http://boundingbox.klokantech.com http://boundingbox.klokantech.com/ I added a sentence/reference to it to the lesson.

The Geoparser doesn’t allow for polygons to be specified, just a circle or a box.

"If the document date is not specified all temporal expressions will be interpreted relative to the date when the Geoparser is run, so that is today’s date." I was a bit unclear in this section. What does this actually mean for a researcher? Could a use case example be provided?

Ok, so I removed “so that is today’s date” from that sentence… hopefully that makes it clearer. What I mean is that if you don’t specify a date then the date of when the Geoparser is run is used as the document date by default. So the assumption here is that the document was written/published on that date which of course will be wrong in many cases. The date doesn’t actually affect the performance of the geo-processing of the geoparser at the moment, it just affects the temporal normalisation. In practice it’s always better if you specify the actual document if it’s available.

Maybe a bit more on XML and TSV. Maybe when first introducing XML output, a few sentences on why XML is good. And then a few sentences later on about why TSV is useful as well. I added a new paragraph at the beginning of that section. Hopefully that explains the advantages/disadvantages of XML etc. Maybe walk them through this: "%s\t%s\t%s\t%s\t%s\n" command in the TSV section. That’s already in the lesson. Let me know if it’s not clear enough. I know that this bit isn’t that easy to process. My other substantive section is to try to incorporate the FAQ section into the text. I think our readers are pretty linear, so I'd worry that they wouldn't realize the patch fixes and other suggestions until after they've gone through – and that they might abandon the lesson early. Could it be incorporated into the text? For example, when the Darwin error might arise, mention the solution there; same with the historical place names, etc. At the very least, we'd need some internal links to anchor readers. Does that make sense? What do you think?

No problem. I deleted some of the information and put other things into the lesson. I still kept the patch fix separate as I feel it disrupts the text too much… but I added another link where it may be needed. Hope that all makes sense. Finally, in response to your suggestions, yes we do lesson updates – and I think in this case, with a new MacOS release right around the corner, we'd want to make sure it works for that. It lands next week right? I usually update day of (living dangerously, right?) so we could always run a test on my machine first thing.

Ok that would be great. Let me know how you get on. So at this stage I'll leave you to implementing Sarah's suggested revisions (I think you've already incorporated Anouk's), and responding to these comments as well.

So I thought I’d addressed all of Sarah’s revisions. Is there something that I missed?

That’s all for now.

Bea

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

ianmilligan1 commented 7 years ago

Hi Bea – this looks great. I think the patch fix is linked better, the bounding box is helpful, and if you're happy with this version I think we're almost there (just wanted to make sure you were happy with addressing Sarah's comments, I think you've done great work there).

I'm going to update my laptop to MacOS High Sierra shortly (after Auden wakes up – so maybe it'll be ready to test during afternoon naptime), so I'll do a last check on that to make sure the Patch Fix works. And then I think we're ready to begin porting over to the main site. 🎉 🎉

So what's left on your end?

1) Can you provide me with a bio? You can just paste it in here. Two or three sentences is perfect. 2) We need an image to represent the lesson. As it's a map-based lesson, why don't we draw on the Library of Congress Map Collection? Let me know if any images pop out to you there and I can generate the thumbnail from it.

Once we have that, I'll do a pull request over on the main site, which'll do all the fun automated checking for broken links and the stuff. I think we could be ready to launch in a few days – we usually do a few days of a soft launch so you can proof it on the live site.

Talk soon!

bea-alex commented 7 years ago

Hi Ian,

just quickly as got to dash home but High Sierra will need at least one other change … so I have to alter the patch fix slightly. But it depends on what the result of the command: uname -srm

On Sierra the result for that is:

Darwin 16.7.0 x86_64

If the numbering just increases by one then presumably for High Sierra it will say something like: Darwin 17...

If that’s the case then you just need to add a 7 after the 6 in the setup script. If not then it would be best to let me know what the result says and I’ll adapt the patch fix for you.

Cheers,

Bea


Dr. Beatrice Alex Research Fellow at the School of Informatics, University of Edinburgh.

Tel.: +44 131 6502684 http://homepages.inf.ed.ac.uk/balex http://homepages.inf.ed.ac.uk/balex http://www.linkedin.com/in/beatricealex http://www.linkedin.com/in/beatricealex http://twitter.com/bea_alex http://twitter.com/bea_alex

On 26 Sep 2017, at 15:31, Ian Milligan notifications@github.com wrote:

Hi Bea – this looks great. I think the patch fix is linked better, the bounding box is helpful, and if you're happy with this version I think we're almost there (just wanted to make sure you were happy with addressing Sarah's comments, I think you've done great work there).

I'm going to update my laptop to MacOS High Sierra shortly (after Auden wakes up – so maybe it'll be ready to test during afternoon naptime), so I'll do a last check on that to make sure the Patch Fix works. And then I think we're ready to begin porting over to the main site. 🎉 🎉

So what's left on your end?

Can you provide me with a bio? You can just paste it in here. Two or three sentences is perfect. We need an image to represent the lesson. As it's a map-based lesson, why don't we draw on the Library of Congress Map Collection http://www.loc.gov/maps/collections? Let me know if any images pop out to you there and I can generate the thumbnail from it. Once we have that, I'll do a pull request over on the main site, which'll do all the fun automated checking for broken links and the stuff. I think we could be ready to launch in a few days – we usually do a few days of a soft launch so you can proof it on the live site.

Talk soon!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/26#issuecomment-332217498, or mute the thread https://github.com/notifications/unsubscribe-auth/AGK8HxTG0i1deIEn5_3Dr5oNJ2MZjkm_ks5smQq3gaJpZM4JFPdf.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

ianmilligan1 commented 7 years ago

Yep, when running I got our old friend:

ianmilligan1@Ians-MacBook-Pro-3:~/documents/software/geoparser-v1.1/scripts$ cat ../in/172172.txt | ./run -t plain -g geonames -o ../out 172172
unrecognised platform Darwin 17.0.0 x86_64
edit scripts/setup, or set LXPATH to appropriate path

Adding the 7 as in

  Darwin?1[01234567]*)

worked.

Do you want to edit the text? Any ETA on the High Sierra patch, is it worth waiting or will it be more than a week or two?

ianmilligan1 commented 7 years ago

Just a note that while we're basically ready to go, we're going to wait for a High Sierra patch.

ianmilligan1 commented 7 years ago

... and with 547d157, we're ready to go. Just waiting on Bea's biography and thoughts on images, and I'll begin transitioning over to the main PH repo. 🎉

ianmilligan1 commented 7 years ago

OK, let's continue the conversation over on the main pull request! Thanks everybody (and especially @ssimpkin & @aelang for their review, and of course @bea-alex for writing the great lesson).