Closed jerielizabeth closed 6 years ago
@whanley Thank you for this lesson! I have a couple of comments and suggestions that I will add here, hopefully by the end of the day today, and then we will move on to the "formal" peer review phase of the journey.
@whanley Thank you again for this lesson. I have read it through a couple times now, and already feel like I have a better handle on the goals of RDF and how it can be used.
A couple of questions and thoughts for you before I solicit external peer reviewers:
Overall, I think the scope of this tutorial works really well - it guides the reader through the process of thinking through their data in a structure suited for RDF and also how to work with that data to clean it and to analyze it. At the end, it would be helpful to provide an outline of some next steps for the reader, but I think the stopping point is right.
One thing we like to do to structure lessons is to include a short overview at the beginning to give readers a sense of what they will create by the end of the tutorial and what software is required to complete the lesson. If you wouldn't mind adding a couple sentences along those lines, that would be great.
And finally, there were a couple places where I got ran into some trouble when working through the lesson.
http://mydb.org#
) and the referencing of other schema URLs. If it isn't opening too large a can of worms, it would be helpful to have a little more context about what is going on here and why a made-up URL works.Happy to talk through any of these!
Hi @whanley! I hope the new semester is off to a decent start for you. I want to check in on the status of the lesson and see if you are ready for me to solicit peer reviewers. I think this lesson will be a very useful addition, so I hope that we can keep moving forward on it.
Best!
Greetings @whanley! I am checking to see if you have had time to work on edits for this lesson and if you are still interested in moving forward.
While we can always reopen it in the future, I will close this ticket on June 15 if I have not heard from you.
Thanks @jerielizabeth for your input and your patience! I've uploaded a new version of the lesson, as well as some auxiliary files.
Here are answers to your specific questions:
Yes, .ttl files can be hosted with the lesson. Where should I put these--with the images? I understand the concern about problems with referring to an external SPARQL endpoint, especially considering what happened with the SPARQL lesson last year. Also, the endpoint I host was down for a couple months last fall, when you looked for it. That said, I think that having a chance to experiment a bit with the data via an endpoint is a useful way to extend the lesson, and I think it may be hard to find any externally supported endpoint that would do this reliably, and even harder for PH to contrive to host one themselves. So perhaps we should leave the link in with the understanding that it is not essential to the lesson, and I should try harder to make sure that the endpoint is always running?
Yes, intermediate sounds about right--but reviewers might be better judges. I think the lesson follows on well from the two existing LOD lessons. I'm going to start work on an advanced lesson on ontologies.
I've added an overview and next steps.
I've made a good number of other changes as well. I hope it all makes more sense now.
@whanley Great to hear from you! I'm a bit swamped until the end of next week, but at that point I will look things over again and, I anticipate, start recruiting reviewers.
Hi @whanley this is looking great! Thank you for making the changes. For the hosting of the files on Programming Historian, the best place would be in the assets folder, in a folder named 'making-a-small-rdf-database'. And I'll ask the team for additional ideas on strategies for the SPARQL endpoint and how best to proceed for sustainability purposes.
The last concern I have before recruiting reviewers is with the very last section (paragraphs 55-57). This might be easily resolved with wording, but it's rather disappointing to get to the end, and find that the very useful feature of being able to associate fields (rather than standardize them), is not something that I can do with the information in the lesson. Perhaps a "next steps" section where we can entice people to go through the effort of installing one of the more robust engines with the promise of this type of linking?
Thank you for all your work on this! I think this is a great lesson and I'm excited to see it finalizing!
Hi again @whanley. I conferred with our resident sustainability and SPARQL experts, and the consensus is that it would be best to avoid the hosted endpoint server and focus instead on making sure that the reader has their own server running locally and is exploring their RDF data that way. That reduces the maintenance burden on everyone involved and puts the lesson on the strongest ground for long-term use.
I think to keep the lesson at intermediate, you should probably keep the focus on the data structure and the more simple interactions that users can accomplish with the Fuseki server. If the reader installed the Fuseki server earlier, would they be able to use the files you provide to experiment with the data manipulations? (moving this direction might make the lesson a bit longer, but I think it will strengthen it overall.)
Hi @jerielizabeth. I decided to change the server program I recommend in the lesson. I've substituted GraphDB for Fuseki. It's easier to install, and it offers quite a few more features, most importantly inference support. I've changed the set of sample queries as a result, and the last query works now. I'm working on a next lesson, as well, which will extend the work on inferencing.
Hope it's ready for review.
Hi @whanley. Thank you for making those changes!
I plan to look over the lesson next week and will be in touch!
Hi @whanley! Thank you so much for these changes. I know it's been a bit of a moving target, but i think the lesson is in great shape for it.
I think we are indeed ready to send the lesson out for review! I'll send out a few inquiries for potential reviewers, who will have a month to complete the review. I'll let you know once the reviewers are confirmed. Once both reviews are completed, I'll look them over, summarize the feedback, and make some final suggestions. You'll then have four weeks to respond, get clarification on suggestions, and make any necessary changes.
Thank you again for all your work on this!
Update on the review for the lesson: James Smith and Bronwen Masemann have agreed to review the lesson, due February 5. They will post their feedback here.
Overall, I think this is a great lesson. It's definitely aimed at someone who isn't familiar with all of the technologies, but also isn't afraid of a text editor. The choice of GraphDB seems reasonable. I'll probably point my students to this sequence of tutorials as extra material.
Here are some quick notes and reactions as I read through the lesson. I've grouped them by section heading. Keep in mind that they are from someone who knows a bit more than the expected audience. Just as advanced study of a subject draws out things that are glossed over in less advanced courses, some of the things I point out can probably be left out to come back in future lessons. The main thing is to make sure readers can't poke holes in what is here.
Overview
Why RDF?
An example
Step 3: Translate into Machine-readable Language
@context
property). RDF/XML doesn't start with any more declaration than a regular XML document, namely, the namespace prefix mappings. But those can appear anywhere in the document, or nowhere if they aren't used.@prefix mydb: <...>
looks like a triple, but only by accident. I would shy away from forcing a similarity into something more. In RDF/XML, it would be xmlns:mydb="..."
, which doesn't look like a triple. The prefix declaration isn't part of RDF, but of the serialization format that the RDF is being poured into.Conclusion and next steps
Thank you, @jgsmith! These are really helpful comments and suggestions.
@whanley I will wait to hear from Bronwen before summarizing the reviews and offering guidance. You are welcome to chat with @jgsmith about his suggestions, but please don't make any changes to the lesson until the second review is in!
Thanks!
I found this lesson overall very clear and interesting. I think the example of transcribing and storing primary source data will be intriguing to a variety of potential users. In contrast to the other reviewer, I came to this process as a person who had less knowledge of specific tools, but a fair bit of experience of teaching (information school) students about data models, metadata, and RDF. So my comments have more to do with presentation and communication.
Here they are:
para. 1 - To whet the appetite of the reader, I would suggest clarifying that this process enables not just recording of data but also manipulation.
para. 2 - I would suggest using a term other than “serial record.” What are the characteristics of the records that lead you to call them “serial records”? To librarians, “serial” generally means “issued periodically” (and a “serial record” means “the catalog record for a serial publication”). So I would avoid using the term “serial”. My understanding is that the characteristic of these records that make them appropriate for this kind of treatment is that they contain structured data. Therefore I think it would be more clear if you said “I often come across documents that contain information that is structured” or perhaps “documents that contain information that is structured and repetitive.”
para. 4. This paragraph introduces terminology and concepts, only to state that they will be skipped over. I think it would be more helpful to readers if you eliminated or moved much of this paragraph, and just included the sentences beginning “This tutorial . . .” and “It employs. . . "
The heading “an example”: I would provide a stronger heading here to explain the goal of this section. Perhaps “The problem of transcribing and storing structured data in documents.”
para. 8 - Instead of stating that this record was “already a database” I would suggest making the weaker claim that the data was already structured. I think that some readers would argue that whatever structured storage and retrieval system was being used (register? card file?) cannot be properly called a database.
para. 14 - I agree with the other reviewer’s comment that the use of XML would not necessarily require an “elaborate customized schema” and I as well encourage my students to consider what is available and then tweak it rather than reinventing the wheel. I am not sure if this is intentional but the structure you have set up here, of examining the three options, is very similar to that used in Hooland, S. van, & Verborgh, R. (2014). Chapter 2: Modelling. In Linked data for libraries, archives and museums: how to clean, link and publish your metadata. Chicago: ALA Editions. My students consistently tell me that this chapter is extremely clear and useful, and it may be helpful to you in sorting out how to express the distinctions between the options you present.
para. 20 and ff. - Overall I think that what would most strengthen your already excellent walk-through of your example is to include visual representations of the relationships between the entities. I’d recommend using the same format as the visuals in the Linked Open Data tutorial: https://programminghistorian.org/images/intro-to-linked-data/intro-to-linked-data-fig5.png.
para. 22 - I like your idea of thinking about what you are doing to enable a machine to read your data. However I would argue that starting right with step 1, the data is already machine readable - a machine could for example count the number of characters in the file. I would clarify here that what you are making machine readable is the semantic structure of the data - the specific identity of each entity, and the nature of the relationships between them.
para. 28 - Just to remind your reader, I would state at the end of this paragraph that person 1 is Mirzan Marie.
para. 35 - “to try it out.” What process will the reader be trying out?
para. 38 - I would link here to a tutorial on regular expressions.
para. 43 - I think a less confident reader would find the inclusion of links to other tools in this paragraph distracting. I would shift the intro to other options to the end of the tutorial. Similarly, I don’t think it’s necessary here to qualify GraphDB here as “far from the last word” as this distracts from what it is actually able to do.
Thank you, @BronwenMasemann! I am glad to see the feedback covering both the technical and the presentation aspects of the project!
I will read through the reviews in the next few days and get a summery to you, @whanley, by the end of the week!
Alright! Thank you, @jgsmith and @BronwenMasemann, for these excellent review!
This is a good combination of specific ideas and some general patterns to consider. As the author, @whanley you of course have the final say on if and how the suggestions are incorporated.
In terms of general patterns, both reviews express concern about the balance between customized and standard schemas, encouraging a stronger emphasis on standardized vocabularies. I think that is a good point to consider, though I don't think that would require restructuring, just increasing the emphasis in places. Both reviewers expressed concern about overwhelming readers with concepts in the opening section. I am going to pull in @alsalin to confirm about using external links to Wikipedia in terms of sustainability practices, but if she agrees, I think that makes sense (fingers crossed that those don't change often.)
@BronwenMasemann's suggestions about linking to other Programming Historian lessons is, of course, strongly encouraged, and I like her suggestion of using similar format and visual strategies. For next steps, I think @jgsmith's suggestion about encouraging file distribution in places like Github are good, and gives a next step that does not require another lesson.
Overall, though, these reviews are very positive and offer great suggestions for refining the lesson and making it that much stronger.
The next step is revision, with a preferred timeline of 4 weeks, making the due date March 15. @whanley, feel free to ask for clarification from myself or the reviewers as you edit, and let me know if you need additional time for final the changes.
Thank you again to everyone!
Thank you very much @jerielizabeth @jgsmith @BronwenMasemann. Your suggestions really improve this piece, and I think I'll be able to integrate almost all of them. I hope to get to this next week. Much appreciation.
@jerielizabeth apologies for the delay, but this notification got buried in my email. As for wikipedia links: using them for definitions of common terms is still considered good sustainable practice. We should encourage authors to use the permalink for an article (a snapshot of the wiki article at a given time) instead of the general URL (Wikipedia's preference for citation).
Greetings @whanley! I am checking in to see how the revisions are going and to see whether you need more time.
When you do push the changes, please include the issue number (#31) in the commit message so that I can track it back easily. (https://blog.github.com/2011-10-12-introducing-issue-mentions/)
Hi @whanley! I hope the semester has wrapped up (or is wrapping up) smoothly! Any updates from you as to when you'll be able to push up your revisions for this lesson? I would love to see this published in the next few weeks, as I am starting the process of rotating off the editorial board. Thanks!
@whanley checking in one last time before I head off the editorial team. I would really like to see your lesson through to publication, so I hope you can check back in and let me know where you're at within the next week. Thanks!
I propose closing this submission. @jerielizabeth you have attempted many times to reach out to the author.
Thank you again to the reviewers on this lesson - @BronwenMasemann and @jgsmith - for your excellent feedback. I am going to close this issue, as I have not heard from @whanley. The lesson can be revived by the author in the future, but will be published under a new editor.
Best,
Jeri
The Programming Historian has received the following tutorial on 'Making a Small RDF Database' by @whanley. This lesson is now under review and can be read at:
http://programminghistorian.github.io/ph-submissions/lessons/making-a-small-rdf-database
I will act as editor for the review process. My role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum.
Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.
I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me. You can always turn to @ianmilligan1 or @amandavisconti if you feel there's a need for an ombudsperson to step in.
Anti-Harassment Policy
This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.