Closed tiagosousagarcia closed 2 years ago
@maxodsbjerg, could I ask you to post the following on this thread, when you get a chance?
I the author|translator hereby grant a non-exclusive license to ProgHist Ltd to allow The Programming Historian English|en français|en español to publish the tutorial in this ticket (including abstract, tables, figures, data, and supplemental material) under a CC-BY license.
Yes of course!
I the author hereby grant a non-exclusive license to ProgHist Ltd to allow The Programming Historian English|en français|en español to publish the tutorial in this ticket (including abstract, tables, figures, data, and supplemental material) under a CC-BY license.
Hi @tiagosousagarcia! Thanks again for setting up the ticket. I noticed that the preview seems a bit off—the lesson should be displaying like this one. Let me know if you would like any help troubleshooting!
Thanks @svmelton for the note and @jenniferisasi for fixing!
I've made some edits https://github.com/programminghistorian/ph-submissions/commit/e4b2b4eace8d2a947a4f7682f518afdc89e9bfba#diff-83b53202f002a488c2e8a75ab1de9e95e1871fc1a19b3c70287ef848974fbac7 on l1-l117 with comments below:
I'll pick up on the rest of the article later.
I note that you are writing in English in a second language, which will be taken into account during peer review. If the article passes through peer review, copyediting will focus on ensuring the articles meets our Write for a Global Audience guidelines.
Finished now! Next set of edits/comments:
To summarise, there is a kernel of a good article here, but it needs to hold the hand of a the reader a little more, especially as a) the tutorial is intended as introductory and b) the tutorial attempts to allow the reader to follow multiple pathways.
So, firstly, these pathways need to be made clearer.
And secondly - and perhaps more importantly - the article needs to assume less knowledge, either by pointing the reader to things that explain new terms/concepts, or being more explicit about what the reader should do: the latter is particularly acute in the 'Data and Prerequisites' section, at which point, as it stands, I can see a reader not knowing what they are being asked to do, suddenly confronted as they are with descriptions of R packages (I know there is a note in the aims section, but the reader needs more help here).
In additin to that, there are some inconsistencies in styling for in paragraph mentions of code, variables, packages, and datasets that needs attention.
@tiagosousagarcia: anything to add from your read through.
@drjwbaker we have the images and they should be in the correct place in the repo, but they are not referenced in the .md -- I'll add them in the commit below where I think they should go (they still need captions though);
Otherwise, just a few extra notes:
p 18 - if the user is trying to get twitter data using the rtweet package, there should be a note warning that the progress bar refers to the total number of requested tweets, rather than the progress of the operation. That is, if the reader requests 18000 tweets, but only 2600 are available, the progress bar will be stuck at about 15%, which might confuse people (it certainly confused me).
p 28 - fig. 1 and the code that creates it should probably include a scale on the y axis
p 40 - the heading 'Interaction count dispersed on verified status' seems a bit confusing to me
p 61 - perhaps a few lines explaining why we are exporting to JSON specifically (as opposed to, say, csv)
@maxodsbjerg Just to add, I appreciate these are a lot of changes to get to. Please don't feel there is a hurry here, as I know many people are already starting their festive leave period. Let's check in again in the new year, and should you have any queries, please ask me and/or @tiagosousagarcia.
Thank you all for the edits/comments! I'll look into them in the new year.
@tiagosousagarcia I note these images still aren't rendering in the preview. To be honest I'm not sure how to fix as I've an issue with another article at the moment https://github.com/programminghistorian/ph-submissions/issues/436#issuecomment-1004843172
This one that @amsichani is working on - code here - works perfectly if that is any help!
@drjwbaker I've noticed it pre-Christmas, but was hoping it was a case of delayed updating. I'll try to find where the bug is, but might need some help from the @programminghistorian/technical-team on this one
@drjwbaker I've noticed it pre-Christmas, but was hoping it was a case of delayed updating. I'll try to find where the bug is, but might need some help from the @programminghistorian/technical-team on this one
A bit more info on the issue -- essentially, it seems that the image location is not being correctly replaced by the slug. The generated preview has https://programminghistorian.github.io/ph-submissions/images/LEAVE%20BLANK/scalable-reading-of-structured-data-1.png as the address for the first figure, for example, even though the slug is indicated correctly on the .md file. On commit 00df8f6 I've removed all 'LEAVE BLANK' fields to see if it nudges it into the right direction
@drjwbaker I've noticed it pre-Christmas, but was hoping it was a case of delayed updating. I'll try to find where the bug is, but might need some help from the @programminghistorian/technical-team on this one
A bit more info on the issue -- essentially, it seems that the image location is not being correctly replaced by the slug. The generated preview has https://programminghistorian.github.io/ph-submissions/images/LEAVE%20BLANK/scalable-reading-of-structured-data-1.png as the address for the first figure, for example, even though the slug is indicated correctly on the .md file. On commit 00df8f6 I've removed all 'LEAVE BLANK' fields to see if it nudges it into the right direction
solved with commit 5c06b46
@drjwbakern @tiagosousagarcia Thanks again for your comments! We had a meeting yesterday in our group and look forward to solving the comments. We divided the comments amongst us and plan on solving them in the next couple of weeks.
How would you prefer that we work with the comments? Fork the .md-file that you have been doing the light word editing on and ping you, when we're done?
@maxodsbjerg Thanks for your note. I think a fork will work. So if you are happy with that approach, please proceed.
Hello all,
Please note that this lesson's .md file has been moved to a new location within our Submissions Repository. It is now found here: https://github.com/programminghistorian/ph-submissions/tree/gh-pages/en/drafts/originals
A consequence is that this lesson's preview link has changed. It is now: http://programminghistorian.github.io/ph-submissions/en/drafts/originals/scalable-reading-of-structured-data
Please let me know if you encounter any difficulties or have any questions.
Very best, Anisa
@maxodsbjerg Just checking in to see how you are getting along with the pre peer-review edits.
@drjwbaker It is all going very well. We have just a few edits left and I plan on finishing them this week.
Fab. Thanks for the update.
@drjwbaker @tiagosousagarcia We have finished the editing now. You'll find the updated markdown here: https://github.com/maxodsbjerg/ScalableReadingOfStructuredData/blob/main/20220117_PHedits_scalable-reading-of-structured-data.md
We've also collected your comments in a markdown-file and described what we did (the text in italic following your comment). You'll find it here: https://github.com/maxodsbjerg/ScalableReadingOfStructuredData/blob/main/20220210_PH-lesson_Scalable_Reading_edits.md
Thanks so much for this @maxodsbjerg. I'm going to replace our version of the article with this one. Then we'll send it out for peer review. Note that there may be a slight delay here as @tiagosousagarcia is on leave.
(and thanks so much for the commentary on our suggestions: the article is much tighter now. Great job!)
@inactinique and @martinmueller39 have kindly agreed to review this article. We can expect their reviews on the 1st and 15th of April respectively. If there are any questions, please feel free to post them on this ticket, or email me or @drjwbaker. Many thanks to our reviewers
Dear authors,
Thank you for this very comprehensive tutorial.
The code works fine (I tested it in a R-Studio notebook). The pre-requisites (software, experience) are well explained, though I think that you did not precise that a Twitter account was necessary to get the data with rtweet. The learning objectives are clearly defined, as well as the workflow you suggest to follow. The lesson is also overall well structured. Another strong point of your contribution is the fact that you explicit links with other lessons and, in your latest paragraph, you highlight the differences with the Beginner’s Guide to Twitter Data and explain how to overcome those differences.
I would suggest to highlight more clearly with which versions of R and R libraries you wrote the code as a change of version can slightly change the syntax of the code. Though on my machine, it worked with the latest versions of R and the packages you are using (macOS).
The code’s easy to reproduce, even for python oriented and R reluctant researchers like me :-). A few words on why R and not python would seem to me useful, but not mandatory. More interesting would be a few explanation on how much data you can handle with your R code and if there are strategies to adopt in case the dataset's too big (which won't happen with the way you are collecting tweets, but can easily happen with other ways to collect data).
I would also suggest a ‘further reading’ section at the end -- that would make your contribution a bit stronger and more interesting to researcher's who are using you lesson as a beginning point.
Some typos:
There might be others, may I advise some proof-reading?
In paragraphs 53, 59, 66 and 68 (I might forget one), I would remove “(Output removed because of privacy reasons)” from the code cell, because it’s not code. Of course, it should be still stated that you removed the output for (very obvious) privacy reasons, but it should be in a text cell, not in a code cell.
You are once using ggplot2, but ggplot otherwise. I would decide for one of the two (or explain why you use the two).
I really enjoyed reading this lesson. Thank you again to the authors.
I know at least one of the authors. We are both members of the board of the Journal of Digital History.
@inactinique Thank you very much for your comments and sorry for the late reply.
I will get back to my colleagues and incorporate your very good comments and suggestions. Thanks again!
@maxodsbjerg Just to note that you don't need to respond until both reviews have come in and I've had a change to summarise them. But thanks for taking a look nevertheless.
@drjwbaker Thanks for the clarification!
@martinmueller39 Do you need some extra time to complete this review?
Dear @maxodsbjerg and authors,
Thank you for your patience through the peer-review process. Unfortunately, our second reviewer had to drop out at the last moment. Instead of delaying the process further, exceptionally, we decided to continue the process with some further editorial support. What follows, then, is a mix between a peer- and an editorial review.
Thank you for writing a clear and well-defined tutorial that will, I believe, be of interest to many PH readers. A good manual on the kinds of work to be done with twitter data (and not just on how to do it) is valuable to many disciplines and researchers in the humanities, and I am sure it will be greatly appreciated.
The tutorial is well structured and is easy to follow along (both in terms of code and ease of reading). I've found a couple of typos and less clear points that I noted below in detail.
The clear definition of the workflow and the use of scalable reading more generally are a high point of the tutorial for me. There are myriad ways of technically doing scalable reading (the R method here being just one of them), but the why and wherefore of this method remain unchanged, and I think you did a stellar job putting that across.
There is still, I think, some space to improve the tutorial even further. I hope my high-level suggestions below will be of help in that regard.
To some extent, with collaborative papers, there is no way of escaping this, as different voices will express themselves differently. In the tutorial, however, I think there is a marked shift between sections that is, sometimes, a little distracting to the reader. This is sometimes shown in less visible ways to the final reader of the tutorial (for example, in the .md paragraphs are sometimes written in a single line, other times have line breaks) which is trivial, but other times there is a considerable shift in register and tone from section to section. Some consolidation work needs to be done here, I think.
This is more crucial for longer, or more complex pieces of code (I've noted them down below) -- I think it would be a benefit for the reader to see the code block before the explanation of its steps, so that there is an anchor to refer back to. Otherwise, the reader might be a little lost as to what exactly the explanation is referring to.
I get a sense the tutorial ends quite abruptly and openly, I would prefer to have a short, one-paragraph conclusion recapping the work that has been done and pointing the reader to the next steps in the scalable reading method. In other words, we have the distant reading aspect, but not the close reading one. I'm not suggesting, of course, that you need to include a close reading example, but as it stands, the reader is left with the impression that there are three, completely independent distant reading approaches which bear no relationship to each other. You've done some of that work throughout the tutorial, but I think a final (very short) section that recaps those points of connection and points the reader to where next to take the research would be very positive.
(it's a long list, but most of these are quite small!)
l. 34 -- introduces the concept of scalable reading without explanation (only needs a small definition here, something like: ...scalable reading of structured data, a combination of close interpretation of individual texts and statistical analysis of the corpus)
l. 46 -- 'The reproducible way of selecting...' good that you're introducing examples of disciplines that could use the method, but maybe also add something about its extendibility to others in the humanities.
l. 52 -- 'This step suggests a chronological exploration of a dataset.' -- delete, it just repeats the header.
l. 52 -- 'Had we worked on data from the National Gallery' -- rephrase, it implies a contradiction with what you said above (that it forms part of the discussion). I.e, 'In the case of the National Gallery data...'
l. 54 -- 'Had we worked on data from the National Gallery' -- rephrase
l. 56 -- 'Had we worked on data from the National Gallery' -- rephrase
l. 66 -- add a small, inline explanation of what 'packages' are.
l. 88 -- 'The package in from the same group' -> 'The package [comes|is|was created by] from the same group...' [typo]
l. 120 -- add a comment to the code, explaining what the function parameters are
l. 124 -- 'according to different periods in art history to which are represented the most or the least' -- a little confusing, rephrase. Perhaps: '...according to different periods in art history, in order to establish which periods are more or less represented in the National Gallery dataset'
l. 129 (and in the section more generally, and elsewhere in the tutorial) -- the subject changes midway through the sentence: you/we
ll. 164-167 -- the code should appear before the explanation, so that readers know what the explanation refers to. Somewhere between ll.142-143.
l. 195 -- 'beware' -> 'be aware'
l. 195 -- 'where we collected' -> 'when we collected'
l. 205 -- 'thus creating two lines for;' -> 'which creates two lines in the visualisation, one for...'
l. 209 -- in-line explanation of what 'aesthetics' are in this context
l.209 -- 'tells R, what the' -> 'tells R what the'
ll. 217-228 -- code should be before its explanation
ll. 245-150 -- explanation of the pipe operator should really come in the first coding section, particularly as you note that 'once you get a hold of this idea the remainder of the data processing will be easier to read and understand'
l. 255 -- 'verfied' -> 'verified'
ll. 367-276 -- more detailed explanation of plot construction would be good
l. 382 -- 'two different kinds of distant readings' -> 'two different kinds of distant reading'
l. 384 -- 'reading individual tweet' -> 'reading individual tweets'
l. 397 -- First mention of R Markdown -- needs an explanation and a reasoning
l. 414 -- 'are variables that changes' -> 'are variables that change'
ll 454-ff -- Why are we exporting the new dataset to a JSON file? i.e., why are we exporting it in the first place, and why specifically using JSON rather than a tabular format (csv, for example)? I still don't see a clear reasoning for it, though I might have missed it of course.
l. 496 -- 'how many likes you top-20 lies above' -> I don't really understand this sentence, there's a typo or something missing somewhere.
l.535 -- maybe add a note that fetching the twitter text for each url can also be automated using the API, though it is not covered in this tutorial
l. 540 -- 'the date of tweets is shown in a way, which is' -> 'the date of tweets is shown in a way which is'
Thanks to @tiagosousagarcia for pulling this together.
@maxodsbjerg: to confirm, these are the last set of edits we suggest as oart of the submission and peer review process, after which we will recommend to the Managing Editor that the article be staged for publication.
On 'The multi-author conundrum', as we have not made any specific recommendations here, can I suggest that we approach this in two stages: first, the authors give the piece a good read and make changes you feel unify the voice of the article; second, when the article is passed to copy editing, this can be flagged in advance as something to which special attention is paid.
If you are happy with this, would it be possible to complete final edits by 19 May?
@tiagosousagarcia Thanks for your comments and suggestions. I look forward to working with them
@drjwbaker - I'm sorry, but the latest development in this thread has gone completely under my radar, so we wont be able to complete final edits by 19 May. I've made arrangements with the other authors and we will have the final edits completed by 25 May.
25 May is perfect. Thanks @maxodsbjerg.
@drjwbaker We fixed all the comments but one and you can find the new updated markdown her: https://github.com/maxodsbjerg/ScalableReadingOfStructuredData/blob/main/20220523_ScalableReadingOfStructuredData.md
The one we didn't fix was this one: #Data and Prerequisites: could not a third option be to offer a base dataset that users can get from PH to get started without having to follow other lessons? We have difficulties seeing how this is possible since the Twitter policy is not to share hydrated tweets only Tweet-IDs. We initially planned on something similiar as your suggestion, but couldn't find any suitable open datasets that were compliant with Twitter's rules.
@maxodsbjerg and team: many thanks. On..
Data and Prerequisites: could not a third option be to offer a base dataset that users can get from PH to get started without having to follow other lessons?
..thank you for the explanation.
@tiagosousagarcia (when you have some time as I know you are busy!) unless you have any final comments, I'm going to suggest we move to the next stage of the editorial workflow https://programminghistorian.org/en/editor-guidelines#recommend-publication---editorial-checklist and then inform @svmelton that this is ready for copyediting.
No further comments from me -- thank you @maxodsbjerg and everyone!
@maxodsbjerg: we need bios for the team. I've started these below. Could you please check and edit if needed (plus add orcids if you have them - I could only find one for Helle)
@maxodsbjerg -- when you have the time, could you check and edit if needed the bios above?
@tiagosousagarcia - Once again I'm sorry for my tardiness. We have the following short bio for all of us:
Max Odsbjerg Pedersen is an Information Specialist at Aarhus University Library at the Royal Danish Library Josephine Møller Jensen is an MA student in the Department of History and Classical Studies, Aarhus University Victor Harbo Johnston is an MA student in the Department of History and Classical Studies, Aarhus University Alexander Ulrich Thygesen is a Ph.D. student in the Department of German and Roman Languages, Aarhus University Helle Strandgaard Jensen is Associate Professor of Contemporary Cultural History in the Department of History and Classical Studies, Aarhus University
And I have the following ORC-ID: 0000-0001-9215-5605
Let me know if you need anything else.
@svmelton, we are ready to recommend this article for publication. The lesson files can be found at:
lesson - ph-submissions/en/drafts/originals/scalable-reading-of-structured-data.md
images - ph-submissions/images/scalable-reading-of-structured-data/
gallery icons - ph-submissions/gallery/scalable-reading-of-structured-data.png
and ph-submissions/gallery/originals/scalable-reading-of-structured-data-original.png
Author bios are as follow:
---
- name: Max Odsbjerg Pedersen
team: false
orcid: 0000-0001-9215-5605
bio:
en: |
Max Odsbjerg Pedersen is an Information Specialist at Aarhus University Library at the Royal Danish Library
---
---
- name: Josephine Møller Jensen
team: false
orcid:
bio:
en: |
Josephine Møller Jensen is an MA student in the Department of History and Classical Studies, Aarhus University
---
---
- name: Victor Harbo Johnston
team: false
orcid:
bio:
en: |
Victor Harbo Johnston is an MA student in the Department of History and Classical Studies, Aarhus University
---
---
- name: Alexander Ulrich Thygensen
team: false
orcid:
bio:
en: |
Alexander Ulrich Thygesen is a Ph.D. student in the Department of German and Roman Languages, Aarhus University
---
---
- name: Helle Strandgaard Jensen
team: false
orcid: 0000-0002-8623-9586
bio:
en: |
Helle Strandgaard Jensen is Associate Professor of Contemporary Cultural History in the Department of History and Classical Studies, Aarhus University
---
Let me know if there's anything missing!
Hi, everyone!
Is it possible that the authors save the sesamestreet_data
object as a csv so we can archive it? I know it is not needed for following the lesson, but it is relevant for its sustainability:
Hi @rivaquiroga -- we did consider that during the review, but apparently twitter has some strict rules about privacy that prevent us from doing so, according to the authors (@maxodsbjerg et al)
We don’t need all the 90 columns. For replicating the plots we just need the dataframes as they were before being piped into the ggplot()
function. For example, for the first plot the three variables needed are date
, has_sesame_ht
and n
. There is no private data there.
Ah, good point, of course -- @maxodsbjerg, could we get copies of the data used for each figure so that we can translate them?
another small request, @maxodsbjerg -- I've noticed during the translation of the lesson that we still don't have captions for the figures: would you mind adding them here? I can add them to the .md directly
Thanks, all! @tiagosousagarcia has this lesson been through copyediting already?
@svmelton - not yet! I was under the impression that happened after the recommendation? I may be wrong, I often am. In any case, I can get in touch with @anisa-hawes later today to arrange it
No worries @tiagosousagarcia! @anisa-hawes do you have capacity to copyedit this lesson?
We have project budget to pay for copyediting.
@tiagosousagarcia Just wanted to let you know that we will be meeting in our twitter project group on Monday and then we will sort out the data sharing-issue and create some captions for the figures.
The Programming Historian has received the following tutorial on 'Scalable Reading of Structured Data' by @maxodsbjerg, Helle Strandgaard Jensen, Josephine Møller Jensen, Alexander Ulrich Thygensen. This lesson is now under review and can be read at:
http://programminghistorian.github.io/ph-submissions/en/drafts/originals/scalable-reading-of-structured-data
Please feel free to use the line numbers provided on the preview if that helps with anchoring your comments, although you can structure your review as you see fit.
I will act as interim editor for the review process, until a permanent editor is assigned. The role of the editor is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum.
Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.
I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me.
Our dedicated Ombudsperson is (Ian Milligan - http://programminghistorian.org/en/project-team). Please feel free to contact him at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudsperson will have no impact on the outcome of any peer review.
Anti-Harassment Policy
This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.