acrymble commented 8 years ago

The Programming Historian has received the following tutorial on 'Intro to Linked Data' by @jonathanblaney. This lesson is now under review and can be read at:

http://programminghistorian.github.io/ph-submissions/lessons/intro-to-linked-data

Please feel free to use the line numbers provided on the preview if that helps with anchoring your comments, although you can structure your review as you see fit.

I will act as editor for the review process. My role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum. Before that process, I will read through the lesson and provide feedback, to which the author will respond.

Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.

I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me. You can always turn to @ianmilligan1 or @amandavisconti if you feel there's a need for an ombudsperson to step in.

Anti-Harassment Policy

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

The Programming Historian is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutinize ideas, to ask questions, make suggestions, or to requests for clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. If anyone witnesses or feels they have been the victim of the above described activity, please contact our ombudspeople (Ian Milligan and Amanda Visconti - http://programminghistorian.org/project-team). Thank you for helping us to create a safe space.

acrymble commented 8 years ago

@jonathanblaney thanks for your contribution. I've had a chance to read through this (not try yet) and I've got a list of comments that I'd ask you to respond to before we seek formal reviews. I've highlighted them by line number to make them easier to find. Most are very minor things.

Generic:

What you've given us is an outline of what linked data is, and how to query it. You don't tell us how to contribute our own linked data and share it with the world. Just make sure this is clear in the description so readers know what they're in for.
I'd ask you also to sprinkle in LOTS of links to Wikipedia. We do this for all key terms on their first use - and by key terms I mean anything you wouldn't expect a random person on the street to know. ANYTHING technical.
Make sure you use a full term before an acronym: Uniform Resource Locator (URL).
This piece would hold together a lot better if you chose a historical theme for your examples and stuck to it. A lot of your examples are popular culture (or colleagues).
Avoid 'click here' and generic link words. Some people use screen readers and it's easier if the linked words give clear idea of what will be found at the link location.

L 3-5 - you say 'we' but you are the sole author. L 6 - you might mention the term 'interoperability' here. L 10-11 - I wasn't immediately clear that Anne Hathaway was 2 people, so this confused me. L12 - what is 'correctly marked up'? What does that mean. Jargon L13 - authority file is jargon L17 - why are you using 'arbitrary numbers'? That sounds fishy. L18 - this sentence would work well at the start of the section, since linked data is about creatin gmachine readable info. Review Questions - what is VIAF? Most people wont be familiar with this and it's key to your point. L34 - this example isn't sustainable. Google changes things all the time. L35 - if using, we need a screenshot. L51 - not clear what you mean. Why did you do this? L52 - if I don't have to worry, why are you telling me? L53 - NIN numbers are very British. Can you internationalise the example? L54 - is that a barcode or an ISBN? L54 - shelfmark is jargon L57 - don't worry? Do I need to know at all then? L59 - data model and syntax are both jargon L63 - I don't remember you inventing this. Also are these going to be long-term sustainable (the link already seems to be broken) L64 - what file? Am I meant to create something? L66 - where do you find all of these authority files? I am new so I have no idea and if I don't know I might think I don't know enough to proceed. L68 - What does combine triples mean? L69 - remind me what a subject and predicate is again? L77 - how do I make an informed decision about which format to use? L82 - any dataset? L83 - 'one' what? L84 - screen shot please L88 - does spacing or caps matter? It does in some languages so I might be confused. L91 - link not formatted properly. L92 - I got lost here completely. L97 - a screenshot please L98 - Hemingway and Twain? I don't remember you mentioning them. L100 - proof read this sentence.

Can you take a look through these and let me know if you have any questions? It looks long, but most things are just matters of clarifying the language and explaining some jargon.

In order to keep things moving along, I'd ask you to do these within 30 days so we can proceed to the review stage. That's 4 December 2016.

jonathanblaney commented 8 years ago

@acrymble thanks for reading the submission so thoroughly. These look like very useful comments and I'm keen to act on them. Will certainly try to do so before 4 Dec.

jonathanblaney commented 7 years ago

@acrymble I've just pushed a revised version of the .md file and a new folder with four screenshots in it.

I've tried to take account of your suggestions. Your general point about historical examples throughout makes complete sense for this audience but I haven't been able to do it in every case, because I don't know of examples like the BBC Olympics coverage or the mathematical genealogy project in history. If anyone can suggest examples that would be great, or I can try to take those out if it they seem incongruous. Hopefully overall there is more cohesion now.

One thing I haven't acted on is your point about L34 being unsustainable. I agree but am stuck for a sustainable one to put in its place. Shall we just drop this bit? It's not vital.

I've also added a nod at the end of the intro to the original project for which the original version of the course was written. I hope that reads OK and is fair. I don't have a URL for the other version of the course yet but could add that when it's available if it's not simply confusing.

acrymble commented 7 years ago

Thanks @jonathanblaney I have reached out to a couple of formal reviewers. Once I have two committed reviewers in place I will confirm the deadline at which point we'll hope to hear from them!

acrymble commented 7 years ago

Two reviewers have agreed to conduct formal reviews. They have agreed to submit their reviews by 24 January 2017. We await their contributions!

terhinurmikko commented 7 years ago

Sharing as I do a love of LOD and the humanities, this has been a pleasure to read. There are some clever (if not entirely new) tricks utilised: the use of a historical figure and a contemporary politician for disambiguation, the mentioning projects outside of academia that use LOD (Facebook, BBC, Google). I also enjoyed the sense of familiarity with the examples, and that was nice.

There are a few things that need to be addressed to help this lesson realise its full potential. Writing for beginners (as I believe this lesson is) can be quite tricky, as you need to define things very clearly, consistently, repetitively even, and without the luxury of taking any short-cuts at all to get to the exciting bits. Some of the sections in the tutorial ought to be revised with that in mind.

Few stylistic points that will be easy to fix:

sec 1, where the final numbered point (6) runs on to line 5.
'linked open data' is repeated several times: define & use LOD acronym (more on this issue later).
secs 4 and 8, perhaps the formatting could match that of sec 1?
Punctuation and capitalisation: "Linked data: what is it?"/ "Linked data. What is it?". Google's Knowledge Graph capitalised inconsistently. ODNB acronym used but not defined with the first mention of the project. Typos, missing words and full stops at the end of paragraphs, e.g. "Seralisation", "inconsisitencies", missing full stop at the end of sec 8.
sec 17 the triple illustrating s-p-o is not displaying properly.
sec 22 cite Berners-Lee's Five Star Criteria https://www.w3.org/DesignIssues/LinkedData.html
Only one named author, but references to "we" which can only refer to multiple authors, not the reader and the author together (as I have interpreted in most of the later instances).
Usually see SPARQL described as a "recursive" acronym (not "reflexive").

I’ve divided the following comments and recommendations into ‘must-haves’ and ‘suggestions’. I've also included a list of some 25 jargon terms that I think ought to link to e.g. a Wikipedia article.

Must-haves

From the onset: Discuss either LOD or LD, but the terms shouldn't be used interchangeably. It might seem trivial, but there is a genuine difference between the two. You can have a closed system or even an offline one, which utilises RDF and contains genuine and proper linked data, but it is not Open. You can also have publicly accessible projects that combine Open and restricted data in the same knowledge graph, but only some (Open) triples are accessible to the public (someone with permissions could access all the triples). Data can be Linked, Open, or Linked and Open. All parts of the terminology are meaningful, and the terms are not truly synonymous. I'm worried that inconsistent use of the terms can get confusing for beginners.
In the examples used in "Linked data: what is it?", I'm not sure what the benefit is for using arbitrary numbers, or, if these are numbers derived from e.g. VIAF identifiers, what the benefit is from claiming they are arbitrary when in fact they are not? I think using numbers in these examples will be confusing for beginners, since here they are neither numbers nor URIs, they're strings of characters, and there are fundamental differences between using URIs and strings, since the latter is always the end of the chain. I also think using the abbreviated identifiers and then exclaiming that they are horrible to read (whilst true!) is not ideal at this stage of the lesson. Showing the full on URIs and then exclaiming those are hard to read: fair enough! Perhaps show full URIs, exclaim how awful they are, then show the abbreviated ones, and explain how prefixes work (I see you do that in later sections e.g. "Serialisation").
Why do the examples of triples have commas? I think this can confuse learners between what is valid syntax and what isn't, particularly since you give examples of .ttl later (which is great, we all love .ttl!), where commas are part of valid syntax (but not as shown in this example).
sec 26 "linked data consists of triple stores which are simply files containing millions of triples". Triplestores are purpose-built databases where data is stored as RDF, and retrieved (& edited, deleted, added, etc.) using the SPARQL. Not all triplestores contain "million of triples", and having millions of triples is not a prerequisite for a triplestore to function or to exist.
sec 27 "If you get a lot of triples together then they form a sort of web of knowledge, because of the way that the triples interlink." Yes, triples interlink, but technically you wouldn’t need more than two triples (a total of 5 URIs) to create the simplest of knowledge graphs, since the object of one triple can be the subject of another. You could also have (admittedly a terrible situation!) with millions of unconnected triples. It is not the number of triples, which is important. And it is not "a sort of web of knowledge", it is a knowledge graph of interlinking triples, and that can exist offline, and/or without pointing to any web resources.
Since there is another lesson on Programming Historian dedicated to RDF and SPARQL, I think these queries need to be broken down more extensively for beginners. Also, point to that lesson.
This lesson ends rather abruptly. Perhaps add a concluding summary, or a final list of key terms?

Suggestions

I would prefer to see tutorials avoid words such as "clearly", "obviously", etc. Especially those aimed at beginners, for whom none of this is necessarily obvious or immediately clear.
The examples and test exercises (which, I completely appreciate are always a challenging and time-consuming thing to put together), come across as a bit inconsistent. For example after sec 22, there are four questions, three of which refer to Jack Straw (who has/have been then example(s) throughout the section, and that's great), but the fourth one is on Albert Einstein.
You do see it everywhere, so perhaps you avoided in fear of using a cliche, but I think the triple diagram of s-p-o should be included in all (especially beginner) tutorials.
sec 30 would be improved by the addition of an example, perhaps from the Music Ontology? I enjoy working with ontologies, so would have liked to have seen them discussed more.
sec 33 "[LOD] can answer questions other datasets cannot", i.e. that navigating through the triples can help us answer questions and access information in ways that flat files, spreadsheets, and relational databases might not? Sure, but in many ways the full potential of that is realised through the type of inference that you have specified as not part of this tutorial. Also, a rich relational database can tell you things a poor RDF knowledge graph can't, and whilst I love LOD, it’s not a silver bullet, it has limitations, and relational databases are very good for some things!
I think learners would appreciate it if you could point to other sources and tutorials where information on how to generate and store RDF can be found.

Jargon

linked open data
semantic web
semantic reasoning
datasets
linked data cloud
structured data
data format
authority file
DBpedia (when first mentioned)
SPARQL (when first mentioned and not later as it is now)
Google's Knowledge Graph
data modelling
domain
dereferenceable
quads (mentioned in passing with no explanation in the "RDF and data formats" section)
schema
prefix (specifically in the context of talking about .ttl)
encode
metadata
library metadata standard
SKOS should be defined at first instance, not later as it is currently
taxonomy
ontology
FOAF
SPARQL endpoint

I hope you find these suggestions helpful. I look forward to seeing the final lesson published, and I'm always happy to discuss and to explain my comments!

acrymble commented 7 years ago

Thanks for this review. We've got one more on the way so I'll suggest @jonathanblaney wait for that to come in and for me to summarise a way forward.

jonathanblaney commented 7 years ago

Sure, I'll wait for the second review and a summary from @acrymble but, in the meantime, I'd like to thank @terhinurmikko for a really thorough and extremely helpful review!

terhinurmikko commented 7 years ago

My pleasure. Always happy to see people share my love of combining LOD with the humanities!

mdlincoln commented 7 years ago

Thank you so much for submitting this lesson, @jonathanblaney! I read it with great interest. I think this could be a particularly useful addition to the Programming Historian, particularly with its detailed, intro-level walkthrough of the concepts behind RDF and linked data, and the many different examples of real-world uses of such data models.

I think would be very beneficial to consider how this lesson could be modified to mesh better with the existing lesson on RDF/Linked Data which is already published on PH: "Using SPARQL to access Linked Open Data". (A lesson I'm partial to, obviously!)

With some revisions, this lesson could work very well in concert with the other in order to avoid too much duplication, while also taking the opportunity to address several aspects RDF/LOD that my SPARQL lesson doesn't talk about at all. Therefore, my comments will be largely addressing the structure and organization of this lesson, based around a few key questions:

Who is this lesson for?
What will the reader be able to do by the end?
Is a SPARQL intro needed here?
Other Organizational thoughts
- Pairing concepts and realizations
- Continuity

Who is this lesson for?

I appreciate that the introduction carefully specifies precisely what this lesson will not do

it will not teach readers to do semantic reasoning
it will not teach readers to produce RDF usable by others

These exclusions beg the question: just what will I get as a reader by spending two to three hours completing this lesson? Communicating why LOD/RDF might be relevant to an historian, and foregrounding that in the very start of the lesson would help structure the remainder. A reader looking for Linked Open Data lessons on PH would see both this as well as a lesson explicitly about SPARQL. How will that reader know which one to consult? The SPARQL lesson gives a conceptual overview of RDF from the perspective of querying it, with the bulk of the lesson devoted to constructing queries.

Is this lesson for someone who wants to produce RDF?
Is it for a researcher who wants to query and process RDF produced by someone else?
Is this is a lesson for someone who has heard the terms RDF, Linked Data, or Linked Open Data, and just wants to get their bearings?

What will the reader be able to do by the end?

This leads to the next need: clarifying what skills should the reader be able to come away with. Most of the information in this lesson is conceptual, rather than directly practical - and this is OK! Wrapping one's head around LOD/RDF is quite difficult. However, it does beg the question: what is this a lesson for, exactly?

Introduction to terms

The first could be grounding readers in LOD vocabulary. If they can come away confident in understanding what the terms RDF, URI, ontology, and serialization mean, then that would make a fantastic contribution, and a good foundation for someone then continuing into the more in-depth SPARQL lesson.

One specific term that you begin to discuss here (and which, I regret to say, isn't covered at all in the SPARQL lesson) is ontology. Is it worth discussing ontologies in a more systematic way? They come up implicitly in L30, and explicitly in L84. Perhaps this discussion could be unified and expanded a bit, even under its own section within the lesson?

Distinguishing data models

Articulating the difference between the graph model of RDF and the tabular model of relational databases (a data format discussed elsewhere on PH) would also be a valuable contribution. Understanding the power of RDF could be helped by giving a more concrete example of a non-graph data model looks like. For instance, you might try showing the example of tabular data (e.g. a CSV - a format that will be familiar to readers who have worked through other PH lessons).

Recognizing RDF when you see it

The section on RDF serializations, both as Turtle and RDF/XML, is also particularly useful, and something missing from the existing lesson on SPARQL. Being able to recognize RDF serializations in the wild would be a great skill for a novice reader to come away with. As with the other sections of this lesson, it would also be very useful to explain to the reader why they need to know about serializations before starting to explain how these serializations work. For example, explaining that many sources of LOD offer serialized versions of their databases for bulk download would illustrate why an historian would want to be familiar enough with serializations to at least recognize the file format when she sees it.

Note on presenting syntax

At several points in the lesson (e.g. L16-L18) you present a kind of pseudo-RDF syntax in formatted code blocks. I like the idea of introducing RDF triples, but I found the syntax confusing - the lesson doesn't make clear that the syntax you present is only conceptual, and not actually a real serialization. I found this confusion was reinforced by asking readers to compose a statement using this pseudo-syntax - it wasn't clear that readers should not be trying to learn the syntax used in this lesson.

Is a SPARQL intro needed here?

If you choose to focus this lesson on an overview of LOD concepts, then I think there's a case to be made to not go in to any depth in SPARQL at all, beyond mentioning it as a query language that is covered in the dedicated SPARQL lesson.

Other Organizational thoughts

Pairing concepts and realizations

I found that the first two sections of this lesson seemed to cover all the core concepts of RDF/LOD first, but then separated out the actual concrete realizations of key concepts (the URI, the ontology, and serialization) to later segments of the lesson. I might suggest that the lesson could be made more immediately concrete - and therefore address many the questions I raised above - by instead pairing discussions of abstract concept and concrete implementation. For example:

this would be particularly helpful when discussing URIs. Instead of introducing arbitrary numbers for the two Jack Straws, why not present the problem (non-unique names) alongside the actual solution (the use of URIs)?
Similarly, the example of finding pianists connected to Liszt would be a good place to move from abstract concept (relationships) to implementation in the real LOD world (ontologies). This would also be a great place to introduce the lists of existing ontologies that are now listed on L84.

Continuity

It was very interesting to learn about all of these different LOD resources and examples. However I found the sheer variety of examples to be a bit overwhelming --- it was difficult to follow the connections between these different concepts of RDF/LOD when the examples used also changed frequently. It might be helpful to isolate one or two resources on which to focus, like DBpedia or VIAF, and then re-use those resources in each example. That way, readers have a better sense of continuity during the lesson. They'll also come away having gained some real familiarity with whichever resource you focus on, which is a good thing.

Thanks again for submitting this lesson. It addresses some important gaps in PH's coverage of linked data, and, with some substantive reorganization and clarification, will really shine.

acrymble commented 7 years ago

Thanks to @mdlincoln and @terhinurmikko.

There are lots of good suggestions here. I'll need to take a pause and stand back for a few days and think about how we can formulate a clear plan for @jonathanblaney to move forward productively. I'll do my best to have a synopsis by next weekend, and then @jonathanblaney maybe we can discuss any outstanding questions.

The review and revision process is the most important step and it's important to keep momentum, so I'll try to get this to you as soon as possible.

jonathanblaney commented 7 years ago

Thanks @mdlincoln for your helpful and encouraging review! There are lots of good ideas here and your key questions are a very good way to help think more critically about the course. The course is repurposed from a standalone one and @acrymble and I had some initial discussion about whether any substantive SPARQL was needed. I'm very happy to go with the majority view/the PH policy about overlap. My general learning style is that I like to read several accounts of the same thing, but, on the other hand, if readers don't know what material is covered in both this course and yours they might find it confusing or irritating to be directed to your course at the end of this one, only to find that they're going over some of the same ground again. So I'm agnostic but with a slight leaning towards cutting the SPARQL stuff from here.

acrymble commented 7 years ago

Ok, thanks again to everyone for their contributions. I've re-read through all of the contents, and I think that our two reviewers have really given @jonathanblaney some focused and practical suggestions for moving forward. Thanks also for being so collegial.

I think the easiest way forward is actually for me to invite @jonathanblaney to respond to each of the points made above as necessary. Perhaps by cutting and pasting the two responses and letting us know what if any he'd like to clarify or otherwise push back against. Responses can be brief unless you think more is needed. I don't want to suggest we debate back and forth repeatedly, but I think that will give me a clear sense of where I may need to mediate (if at all) before @jonathanblaney works towards the final version.

The one area we might want to discuss together is the notion of how this 'fits' with @mdlincoln's previous tutorial. If we think of this piece as a contribution to the historiography of digital humanities pedagogy, then we'd probably say @jonathanblaney has to demonstrate his contribution, but that it's OK to have a certain degree of overlap to ensure that the reader can take away meaningful knowledge from the tutorial as a standalone exercise. Our challenge is hitting that balance between the new contribution and slotting in with the existing tutorial. In other words, I don't think @jonathanblaney has to write this as a perfect segue into @mdlincoln 's tutorial, but it should be clear that he's aware of it and has attempted to build productively upon it.

@jonathanblaney can I turn it over to you to respond? I'd like to have this live before people start thinking about their courses for next year, so maybe 25 February 2017 for responses? Let me know if that will be a problem and we can discuss.

jonathanblaney commented 7 years ago

I think the feedback from both reviewers is high quality and I will try to incorporate it where I can. So I'll just respond to things I either disagree with, don't understand, or cannot see how best to incorporate.

On the overlap with @mdlincoln are you saying, essentially, make it mesh better with the existing course but keep roughly the same length? Or do you want me to cut it back in order to make it more of a prelude to the dedicated SPARQL course? I'm really happy to do either, whatever you think works best for the site.

Yes, I'll try to make change where I can, and respond where I'm not, by 25 Feb. Thanks @acrymble for all your input so far.

acrymble commented 7 years ago

Why don't you start with the easy bits first, @jonathanblaney . We might get a better sense of how to integrate it with @mdlincoln once you've answered some of his queries about audience and what you hope people will be able to take away.

jonathanblaney commented 7 years ago

I'm going to confine myself to answering @mdlincoln's overall questions, to try to keep things moving. But I should say at the outset that I'm happy to take the views of all three of you into account and modify my conception of the course, if that improves it.

The lesson is aimed a historian who is interested in linked open data but doesn't know much about it, let alone how they might use it or even produce it themselves, what are its pros and cons, and what are a some real-world examples of usage. It's not going to teach anyone how to produce RDF and it isn't sufficient by itself for someone who wants to query and process someone else's RDF; I do hope that the course would be a useful first step towards learning those more practical skills, for those who can see a benefit in their work.

I want the reader to come away with a (hopefully correct) understanding of the core concepts of LOD and a little practical insight into how LOD can be queried with SPARQL. As I said, I'm happy to cut the practical element and refer to @mdlincoln's course if that's the consensus. I wonder if the course would then be too abstract, in that there's no practical take-away at all, but that could be addressed by describing the kinds of queries you can do without actually getting into practicalities.

The question of how the reader knows which lesson to take is a good one and maybe @acrymble is best placed to judge. When I look at the Data Management topic, for example, it seems intuitive that I would do Getting Started With Markdown before taking Sustainable Authorship in Plain Text Using Markdown and Pandoc. For the moment there would only be two lessons in the Linked Data section and the lesson titles might be enough to guide the user on the basis of their level of knowledge.

The other structural suggestion that @mdlincoln makes is to pair concepts and their realisations. I think I can see how that would work and am happy to give it a try if that's the consensus view.

Finally both peer reviewers asked for more on ontologies. My reluctance here is based on my not knowing much about ontologies, but since it's clearly an omission I will try to add something more substantial - this might take a bit of extra time.

@acrymble, does that gives you the kinds of answers you want? If so, can you lay out my next steps and deadlines?

acrymble commented 7 years ago

@jonathanblaney I've re-read both of these tutorials just to get myself clear again on how they might fit together.

Regarding SPARQL, I'm of the opinion that what you have is fine, but there should be a very clear signposting to @mdlincoln tutorial for anyone who wants to learn more. Your description is, as you say, for people who want to know what this is all about. I think they get that from what you have. But if they actually want to get a novice-level of aptitude with the practices of SPARQL querying, they need @mdlincoln. Similarly, once this is finished, I think it would be helpful for readers to link back to @jonathanblaney 's lesson fairly early in @mdlincoln 's tutorial just as a signpost for readers who haven't quite yet figured out LOD/LD.

Regarding Ontologies, I think you could mention the term and give a link to Wikipedia and a hint that this is something you'll need to learn, but it could be a tutorial on its own, so let's try to keep focused. Especially if you aren't confident in that area.

Does that help?

jonathanblaney commented 7 years ago

@acrymble thanks, this is very helpful. I'll proceed on this basis. So my next step will be to try to take in all of the excellent suggestions from the peer reviewers and let you know when done.

jonathanblaney commented 7 years ago

I've made changes to the lesson in response to the peer reviewers' comments. I think that in almost every case I've addressed, or addressed as far as I could, the suggestions made.

The main thing that I haven't been able to do is to take in something that was mentioned in different ways by both reviewers: that it might be better to concentrate on a couple of resources (such as VIAF and Dbpedia) and to make the review questions more consistent. I have deleted the Einstein question which, as @terhinurmikko pointed out, was quite incongruous, but I struggled with doing more than that. I agree that the resources covered do jump around a bit; that was my original intention in the course, to try to show that there are various resources. If the questions, in particular, still need more work or seem too scattershot, I wonder if it's just best to delete them completely.

I found the peer review extremely useful and I hope the course is much improved as a result. If I can do more to improve it further I'm happy to try.

acrymble commented 7 years ago

Thanks @jonathanblaney. I will try to read this through as soon as I can.

mdlincoln commented 7 years ago

@jonathanblaney Thank you for putting so much hard work in to revising this lesson. I am very happy to see how well you have managed some major changes, including:

A clearer introduction about goals for the lesson
A much richer and more concentrated discussion of ontologies - this is incredibly useful!
Elaborating great review of RDF serializations

As I noted in my review, I still believe you need to more tightly pair the presentation of problems with the presentation of the solutions. You've developed each of these sections beautifully - I think now it's just a matter of moving them about into a good configuration. The section on The URI ought to be placed right after the initial discussion of LOD and triples, followed by discussion of ontologies; presenting the abstract concept, followed quickly by a realization of a solution. What's more, the ontology section is now so rich that it really ought to (starting at line 37) be its own named section like the URI does, so readers can quickly find it.

In pursuit of the goal of making the lesson offerings clear to the reader, you may also want to review the subheading names. Bear in mind that most readers will not be reading this entire lesson through, but will use your table of contents to skip directly to a section of interest. In that vein, it's probably best to change the "Why is it useful" header, as that section really talks about ontologies, not "why is it useful". Similarly, "Why haven’t I heard of it, then?" is an odd section title - they're reading an in-depth intro to LOD, so they very much have heard of it. The work this section is really doing is giving examples of how LOD powers many systems that the reader is already using - so maybe just title it as such. Really, your first three "questions" might best be answered in a snappy introductory section, with the big content pieces (Triples, URIs, Ontologies, Serializations, SPARQL) broken out as the other main headings. You've already done most of the rewriting necessary. Now you just need to update the headers and their placement to match.

Also, now that the core contributions of this lesson are so nicely defined, I think it is also a good chance to begin to make the prose a bit more terse and to-the-point. I really appreciate the conversational tone of the tutorial - but there are times that the informality leads to repetition or even confusion:

For example, you give a fantastic description of a URI in line 61. But then, in line 62: "That means we’re all talking about the same thing, and only that thing. It’s less complicated than it sounds; in real life we do this all the time without worrying about it." In fact, we very rarely manage to speak in reliably unique and precise ways in real life, and we also worry about it a lot - hence the general mission of Linked Open Data to begin with! I'd strike that out completely.
Restating something "in other words" is a great technique, but try not to over use it. In your discussion of URIs, paragraphs 71&72 go on a long tangent about other unique identifiers and locators, without significant payoff as to why the historian - your audience - just had to read about all of that. If reading several paragraphs about the difference between URIs and URLs is important for an historian trying to learn about LOD, then you need to make clear why it would ever matter for them. Otherwise, keep it much more brief.

In general, I'd search for verbose phrasing like this to trim back, so that people can more easily mine knowledge from your text.

Finally, regarding your most recent comment: with a cohesive structure to the overall lesson, I'm less concerned about picking just one LOD resource as an example to stick to. I agree, though, that the Q&A bits in the lesson are a bit scattershot - I would simply remove them.

acrymble commented 7 years ago

I should note, I asked @mdlincoln to re-read this, since the changes you made were so substantial and I wanted to make sure we were moving in the right direction, which it seems we are.

I've asked him not to move any goalposts in his re-reading, and I think he's been careful to do that. @jonathanblaney there are really only a couple of things here that he's identified:

1) Rethinking some of the section headers 2) Reordering some of the information & pruning the text 3) Deciding if the Q&A are needed.

I do agree that we need to do some copyediting of the text to cut away a little. I can help you more directly with that, but it might be worth you taking a first pass before we go down that road.

terhinurmikko commented 7 years ago

Hi all,

My two cents, if they are still of interest - I realise I'm late to table, apologies for that, and if they can't be incorporated, I completely understand.

Overall, much improved! A few things I noticed:

Section 37 - Ontologies are often defined as per Gruber, 1993 ("...specification of a conceptualization"...), but there are other definitions that could be cited here as well.

Section 40 - if you genuinely want to find a tutor-student relationship specific to musicology, the Linked Jazz project incorporates mentorOf ( https://linkedjazz.org/), not that I'd expect you to know every property in every possible ontology, but since you mention it, and I happen to know about this one...

Section 50 - I would argue that if you don't know how a company does something, then they are not producing LOD - they might well be consuming it - but if you're differentiating between the two, then you should describe these projects and companies as using Linked Data...but not Linked Open Data (LOD). I'm pretty sure my original review described how data can be Open, Linked, or Open and Linked, so I won't repeat that here. If not, happy to discuss further.

Section 69 - creating unique identifiers which also happen to be HTTP URIs, surely, and not "strings"? Again, the choice of words matters, because in terms of RDF, a string is specifically something which is not a HTTP URI...I don't think this is a technology-based error, I think it's just an unfortunate, somewhat sub-optimal word choice, that's all.

Sections around 81 - I find that there is some jumping between Turtle and RDF/XML then suddenly mentions JSON but not JSON-LD?

Section 93 - I would have thought that sticking to the same example that you used for Turtle would help highlight that you don't lose any information between converting between these different formats, and also to show how the same information looks in Turtle vrs RDF/XML. I'm not sure what the benefit is of bringing in a new example at this point? Happy to be convinced though.

Section 91 - could do with an illustrative example I think, or something like W3C's turtle tutorial section 2.3 https://www.w3.org/TeamSubmission/turtle/#sec-tutorial)

Section 99 - sounds like there's no validator for Turtle, but there are...easyRDF.org (http://www.easyrdf.org/converter) even lets you just copy-paste in your Turtle...

Very minor changes all in all! Apologies again for getting back to you all so late.

All the best,

On 20 March 2017 at 18:59, Adam Crymble notifications@github.com wrote:

I should note, I asked @mdlincoln https://github.com/mdlincoln to re-read this, since the changes you made were so substantial and I wanted to make sure we were moving in the right direction, which it seems we are.

I've asked him not to move any goalposts in his re-reading, and I think he's been careful to do that. @jonathanblaney https://github.com/jonathanblaney there are really only a couple of things here that he's identified:

Rethinking some of the section headers

Reordering some of the information & pruning the text

Deciding if the Q&A are needed.

I do agree that we need to do some copyediting of the text to cut away a little. I can help you more directly with that, but it might be worth you taking a first pass before we go down that road.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/33#issuecomment-287695219, or mute the thread https://github.com/notifications/unsubscribe-auth/AJR40RjJMosn7SvYU_xV7jg9M3FPQ05sks5rnjHmgaJpZM4KmXAc .

-- Dr Terhi Nurmikko-Fuller Lecturer Digital Humanities Centre for Digital Humanities Research School of Archaeology and Anthropology Australian National University 120 McCoy Circuit 2601 ACT, Australia

acrymble commented 7 years ago

Thanks @terhinurmikko for re-checking the text. Those all sound like very simple fixes. @jonathanblaney if there are any that are problematic, let me know.

I would suggest addressing these fixes by @terhinurmikko first as they are all minor. Then working on the section headers and text pruning. I can then do a copy edit of the text with you, and at that point I would expect we'd be ready to publish. You don't have to accept every suggestion (you can defend the need for the Q&A if you think they're important)

If you'd like to discuss anything @jonathanblaney let me know. I think we're in the home stretch here with this lesson. So that's everyone for your perserverence.

acrymble commented 7 years ago

@terhinurmikko just for a clarification and to save some digging, for section 37 you note there are other definitions of ontologies worth mentioning than Gruber 1993. Can you tell us what you had in mind here? @jonathanblaney noted ontologies weren't his strength so a pointer might be helpful.

terhinurmikko commented 7 years ago

Hi,

I work with ontologies all the time (just love them, for my sins I suppose) but finding appropriate citations for a general summary of an ontology has been surprisingly tricky - It might be a good idea to read what the W3C provides (e.g. https://www.w3.org/standards/semanticweb/ontology), but ultimately it seems that most articles sooner or later do point to Gruber...

A list of recommended readings created for a talk I gave on Linked Data and ontologies in January 2017:

-

Allemang, D. and Hendler, J. (2008) Semantic Web for the Working Ontologist. Morgan Kaufmann.

Bekiari, C., Doerr, M., Le Bœuf, P., and Riva, P. (eds). FRBR object-oriented definition and mapping from FRBRER, FRAD and FRSAD (version 2.4). International Working Group on FRBR and CIDOC CRM Harmonisation. Available at http://cidoc- crm.org/docs/frbr_oo/frbr_docs/FRBRoo_V2.4.pdf.

-

Berners-Lee, T., Hendler, J., and Lassila, O. (2001) “The Semantic Web”. In Scientific American, May 2001.

DuCharme, B. (2013) Learning SPARQL: Querying and Updating with SPARQL 1.1. O’Reilly.

Jewell, M., Lawrence, F., and Tuffield, M. (2005) OntoMedia: An Ontology for the Representation of Heterogeneous Media. Available at http://eprints.soton.ac.uk/261009/1/OntoMedia.pdf.

Le Boeuf, P., Doerr, M., Ore, C.E., and Stead, S. (2015) Definition of the CIDOC Conceptual Reference Model. ICOM/CIDOC Documentation Standards Group. Available at http://cidoc-crm.org/docs/cidoc_crm_version_6.2.2%20(WorkingDoc).pdf.

Lincoln, M. (2015) “Using SPARQL to access Linked Open Data”. In Programming Historian. Available at http://programminghistorian.org/lessons/graph-databases-and-SPARQL.

Oldman, D and Norton, B. (2014) “A new approach to Digital Editions of Ancient Manuscripts using CIDOC-CRM, FRBRoo and RDFa”. In Digital Classicist. King’s College London. Available at http://www.digitalclassicist.org/wip/wip2014-10do.html.

Van Hooland, S. and Verborgh, R. (2014) Linked Data for Libraries, Archives and Museums: How to clean, link and publish your metadata. Facet Publishing.

Wood, D., Zaidman, M., Ruth, L., and Hausenblas, M. (2014) Linked Data: Structured Data on the Web. Manning Publications.

Not ontology specific necessarily I'm afraid, but perhaps it could serve as a starting point? Having a cursory skim over my (unpublished) PhD thesis, the texts I've pointed to in my intro section that are not scope notes and definitions of specific ontological models (which is mostly what I end up doing) or indeed Gruber or Allemang&Hendler seem to be these:

Alani, H., Kim, S., Millard, D., Weal, M., Hall, W., Lewis, P., Shadbolt, N. (2002) “Automatic Ontology-based Knowledge Extraction and Tailored Biography Generation from the Web”. In IEEE Intelligent Systems, 18, (1): 14-21.
Brewster, C. and O’Hara, K.,(2004) “Knowledge Representation with Ontologies: The Present and Future”. In IEEE Intelligent Systems (January/February 2004.
Segaran, T., Evans, C., Taylor, J. (2009) Programming the Semantic Web, O’Reilly.
Wilks, Y. and Brewster, C. (2006) “Natural Language Processing as a Foundation of the Semantic Web”. In Foundation and Trends in Web Science, vol. 1., numbers 3-4.

The only thing I can think of where I might be able to toot my own horn is here: http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/nurmikko-fuller/ which has a (very!) brief summary of ontologies at the start. I have some things in the pipeline but they've all just gone in for review.

Perhaps the Programming Historian needs a lesson specifically on ontologies that we can all point to as the quintessential source for defining the term (particularly in a context that is meaningful for LOD practitioners outside the sphere of hardcore computer science)? :)

All the best, Terhi

On 21 March 2017 at 08:00, Adam Crymble notifications@github.com wrote:

@terhinurmikko https://github.com/terhinurmikko just for a clarification and to save some digging, for section 37 you note there are other definitions of ontologies worth mentioning than Gruber 1993. Can you tell us what you had in mind here? @jonathanblaney https://github.com/jonathanblaney noted ontologies weren't his strength so a pointer might be helpful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/33#issuecomment-287896345, or mute the thread https://github.com/notifications/unsubscribe-auth/AJR40RrFZGYElMcQGYdW8hlub2st3I5Cks5rnujvgaJpZM4KmXAc .

-- Dr Terhi Nurmikko-Fuller Lecturer Digital Humanities Centre for Digital Humanities Research School of Archaeology and Anthropology Australian National University 120 McCoy Circuit 2601 ACT, Australia

acrymble commented 7 years ago

It sounds like the ontology issue is best dealt with in another lesson then. That's quite complex given the request.

jonathanblaney commented 7 years ago

Thanks very much to all three of you.

@mdlincoln thanks for rereading and your suggestions (and I'm glad you think the text is improved!). I can incorporate these, I'm sure. It's a good point about the headings: I mentally skipped over them in revision and they do need rethinking. And thanks for your point about repetition: I really went against my own instincts to repeat key points and I'd be very happen to tighten this up. The text certainly needs copy editing and proofing, but I'll gladly take @acrymble's advice on when repeating a point is pedagogically useful or not.

@terhinurmikko thank you for these additional suggestions and corrections. I'll incorporate these but perhaps include some form of words about other ontology definitions being available. I'm sure @acrymble is right that a lesson on ontologies on PH would be useful, but I'm definitely not the person for that!

I'll try to get this done in the next week.

jonathanblaney commented 7 years ago

I've made some revisions, particularly in the order and the subheadings, and cut some material (including the Q&As. If @acrymble agrees that we're now at copy-editing stage then I think I can make it more concise still (I'd also like to check again that everything flows in its new configuration).

acrymble commented 7 years ago

Thanks @jonathanblaney I will take a look. For ease-sake, can you confirm if there were any of @terhinurmikko or @mdlincoln comments that you do not think you addressed? Apart from the ontology query.

jonathanblaney commented 7 years ago

Apart from the ontologies, I tried to address everything. I repeated a bit of Turtle with only one triple and then showed that as RDF/XML, which arguably doesn't fully address the suggestion @terhinurmikko made: "Section 91 - could do with an illustrative example I think, or something like W3C's turtle tutorial section 2.3"

acrymble commented 7 years ago

@jonathanblaney I have spent some time with the lesson and tried to work on the language and the order of some of the information. In particular to make sure concepts were introduced before their shortcomings discussed. I tried to do so from the perspective of a potential reader and someone not familiar with the intricacies of LOD (which I'm not). As such, I'm sure I've caused problems or otherwise countered the advice you got from the expert reviewers (hopefully not very often).

Can you please take a look at the revised text and let me know where you're unhappy, or where I've introduced problems? Those are probably the sticking points where another explanation was needed because I didn't quite get it.

There were also a couple of specifics I wasn't sure on:

paragraph 25 - you switch at this stage from VIAF ids to your Tobias ones. But you don't really explain this to the users, so it's confusing.

paragraph 69 - I didn't really get SKOS. This example is really abstract.

paragraph 72-3 - I think it's easier for readers if the verbose exaple precedes the truncated one.

Also I never really got my head around your Tobias URIs. Where do I actually find these if they aren't really files on a server? Do I have to phone you and ask? If it's not a file on the server, how do you know someone won't put something there at some point and make the unique id no longer unique?

--

I hope you think we're almost there, because I definitely do (unless I've screwed up your text).

jonathanblaney commented 7 years ago

@acrymble thanks for working on this again. You must be getting sick of it.

I'll have a read through your revisions and answer your three para questions then.

A quick response on the URIs: I can't have explained this very well, which is telling. You don't need to find them anywhere because their only function is to be unique descriptions of something. You want to have control over the name to stop anyone creating a URL with that name, but I can't guarantee that if I leave the IHR someone won't create one in five or ten years time with exactly the same name, although the design of the URI was intended to make that highly unlikely. If there were files there, so they were dereferenceable, those files could still be replaced by someone in the future too, though.

jonathanblaney commented 7 years ago

@acrymble I've read through the changes you've made and made notes. It's much snappier now and I think overall the order makes sense. There are some typos but there are also a couple of things that don't quite make sense, I think, in the current order. I've also noticed a couple of mistakes which I'm pretty sure are original to me. With all of these, do you want me to make changes or to give you a list? I don't think they will take me very long to implement.

On your questions: para 25: absolutely, this is weird; I think it must have been caused by moving the text around; I can try to fix this para 69: I think here and the next para I can unpack what's going on with the XML, how each part relates to the triple model para 72-3: agreed; happy to switch these around and slightly rewrite the URIs: I talked about this in para 31 so I could say a little more there on the principle, or come back to it when I talk about the URIs I created (whichever you think better)

acrymble commented 7 years ago

@jonathanblaney you can make the changes.

acrymble commented 7 years ago

Suggestion for lesson icon: https://www.flickr.com/photos/britishlibrary/11290944036/

plays on 'triple' and 'linked' idea.

terhinurmikko commented 7 years ago

I quite like it, makes a change from the usual images of LOD clouds and iron chains...

On 25 April 2017 at 01:29, Adam Crymble notifications@github.com wrote:

Suggestion for lesson icon: https://www.flickr.com/photos/ britishlibrary/11290944036/

plays on 'triple' and 'linked' idea.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/33#issuecomment-296706906, or mute the thread https://github.com/notifications/unsubscribe-auth/AJR40bIuhnO_kxf4oHsrxze6nFWQm7NXks5rzL_DgaJpZM4KmXAc .

-- Dr Terhi Nurmikko-Fuller Lecturer Digital Humanities Centre for Digital Humanities Research School of Archaeology and Anthropology Australian National University 120 McCoy Circuit 2601 ACT, Australia

jonathanblaney commented 7 years ago

Sorry for the delay. I've made those changes.

I like the image! It's a bit unexpected but it makes sense.

acrymble commented 7 years ago

Final read through (I will try to fix these myself to save you another go):

[x] p27 code example: one URI ends in / and the other doesn't
[x] p30: "try pasting one of them" - you don't actually give any Tobias URIs anymore. Either try pasting 'it' or just note there are no files to view at data.history.ac.uk/tobias-project/ - added http://data.history.ac.uk/tobias-project/person/15601 to this section to act as the example
[x] p33: " in the examples above from the Tobias project. You can’t find them anywhere; they are a convention." - again, you don't give us these, so this sentence needs to be changed.
[x] p39: "Linnean" should be 'Linnaean'
[x] p42: mentorOf should be italicised or in quotes or something to differentiate it.
[x] p54: you say this URI was invented in the previous section, but you've removed those. Language just needs to be adjusted slightly.
[x] p61: is that represented the way you wanted? All on one line as a codeblock? (double-checked, and yes it clearly is).
[x] p71: subjectelement should have a space.

acrymble commented 7 years ago

This lesson includes the following files:

/lessons/intro-to-linked-data.md /lessons/intro-to-linked-data/intro-to-linked-data-fig1.png /lessons/intro-to-linked-data/intro-to-linked-data-fig2.png /lessons/intro-to-linked-data/intro-to-linked-data-fig3.png /lessons/intro-to-linked-data/intro-to-linked-data-fig4.png /lessons/intro-to-linked-data/intro-to-linked-data-fig5.png /lessons/intro-to-linked-data/intro-to-linked-data-fig6.png

acrymble commented 7 years ago

I have moved all of these files over to the live site. This lessons is now published as of 7 May 2017, and is available at the following URL:

http://programminghistorian.org/lessons/intro-to-linked-data

Thanks to @jonathanblaney for your hard work on this. And to @terhinurmikko and @mdlincoln for your constructive reviews.

I'll tweet about this from the Programming Historian account this week and would appreciate your retweets and tweets of your own. Any other suggestions for how we might disseminate this lesson to people who can benefit from it would be very gratefully received.

Now that the hard work has been done, we want to make sure people use it!

mdlincoln commented 7 years ago

Congratulations @jonathanblaney and @acrymble ! It's great to see this live.

terhinurmikko commented 7 years ago

Well done everyone, Ive just seen the tweet go round (have retweeted accordingly)!

Adam, could you please correct the typo in my name at the top of the page (the reviewers section).

Thank you! Terhi

On 8 May 2017 at 08:37, Matthew Lincoln notifications@github.com wrote:

Congratulations @jonathanblaney https://github.com/jonathanblaney and @acrymble https://github.com/acrymble ! It's great to see this live.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/33#issuecomment-299740233, or mute the thread https://github.com/notifications/unsubscribe-auth/AJR40YMCSA6Olyd9-_powdks7DXpyKylks5r3kergaJpZM4KmXAc .

-- Dr Terhi Nurmikko-Fuller Lecturer Digital Humanities Centre for Digital Humanities Research School of Archaeology and Anthropology Australian National University 120 McCoy Circuit 2601 ACT, Australia

acrymble commented 7 years ago

Sorry, fixed.

terhinurmikko commented 7 years ago

Thank was quick! :D

Thanks, and well done, everyone!

On 8 May 2017 at 19:57, Adam Crymble notifications@github.com wrote:

Sorry, fixed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/33#issuecomment-299824830, or mute the thread https://github.com/notifications/unsubscribe-auth/AJR40bzTHUqs0_dNpQY8Wd7CGkqX1IeWks5r3ucZgaJpZM4KmXAc .

-- Dr Terhi Nurmikko-Fuller Lecturer Digital Humanities Centre for Digital Humanities Research School of Archaeology and Anthropology Australian National University 120 McCoy Circuit 2601 ACT, Australia

jonathanblaney commented 7 years ago

Thanks to all three of you for the time you've spent making this course better. I've really appreciated your effort and your insights.

programminghistorian / ph-submissions

Review Ticket for 'Intro to Linked Data' #33

Anti-Harassment Policy

Generic:

Who is this lesson for?

What will the reader be able to do by the end?

Introduction to terms

Distinguishing data models

Recognizing RDF when you see it

Note on presenting syntax

Is a SPARQL intro needed here?

Other Organizational thoughts

Pairing concepts and realizations

Continuity

Allemang, D. and Hendler, J. (2008) Semantic Web for the Working Ontologist. Morgan Kaufmann.

Berners-Lee, T., Hendler, J., and Lassila, O. (2001) “The Semantic Web”. In Scientific American, May 2001.

DuCharme, B. (2013) Learning SPARQL: Querying and Updating with SPARQL 1.1. O’Reilly.

Jewell, M., Lawrence, F., and Tuffield, M. (2005) OntoMedia: An Ontology for the Representation of Heterogeneous Media. Available at http://eprints.soton.ac.uk/261009/1/OntoMedia.pdf.

Le Boeuf, P., Doerr, M., Ore, C.E., and Stead, S. (2015) Definition of the CIDOC Conceptual Reference Model. ICOM/CIDOC Documentation Standards Group. Available at http://cidoc-crm.org/docs/cidoc_crm_version_6.2.2%20(WorkingDoc).pdf.

Lincoln, M. (2015) “Using SPARQL to access Linked Open Data”. In Programming Historian. Available at http://programminghistorian.org/lessons/graph-databases-and-SPARQL.

Oldman, D and Norton, B. (2014) “A new approach to Digital Editions of Ancient Manuscripts using CIDOC-CRM, FRBRoo and RDFa”. In Digital Classicist. King’s College London. Available at http://www.digitalclassicist.org/wip/wip2014-10do.html.

Van Hooland, S. and Verborgh, R. (2014) Linked Data for Libraries, Archives and Museums: How to clean, link and publish your metadata. Facet Publishing.