Open hawc2 opened 2 months ago
I confirm @rogorido and @nabsiddiqui shared with me access to their repository containing all the required files, and that I handed them over to @anisa-hawes to allow the publishing team to generate the preview, thanks.
Hello Giulia @semanticnoodles, Igor @rogorido and Nabeel @nabsiddiqui,
Many thanks for sharing the lesson submission materials with me. I've now checked the Markdown file, and add some key elements of metadata. I've also checked the accompanying images and assets, ensuring each element meets our requirements.
You can find the key files here:
You can review a Preview of the lesson here:
--
A few initial notes:
## Header 2
is the largest.alt_text
+ captions for each of your images. We have committed to providing alt-text for all figure images, plots and graphs included in our lessons, so you'll need to add this as part of your revisions. These notes on Descriptive Alt text may be useful to you. .tsv
and a .csv
version of the dataset, although only the .csv
appears to be used in the lesson. Is the .tsv
alternative required too? Hello again Igor @rogorido and Nabeel @nabsiddiqui.
Your lesson has been moved to the next phase of our workflow which is Phase 2: Initial Edit.
In this Phase, your editor Giulia @semanticnoodles will read your lesson, and provide some initial feedback. Giulia will post feedback and suggestions as a comment in this Issue, so that you can revise your draft in the following Phase 3: Revision 1.
%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
} } }%%
timeline
Section Phase 1 <br> Submission
Who worked on this? : Publishing Manager (@anisa-hawes)
All Phase 1 tasks completed? : Yes
Section Phase 2 <br> Initial Edit
Who's working on this? : Editor (@semanticnoodles)
Expected completion date? : April 20
Section Phase 3 <br> Revision 1
Who's responsible? : Authors (@rogorido + @nabsiddiqui)
Expected timeframe? : ~30 days after feedback is received
Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.
@anisa-hawes Thanks for your comments. As for the tsv file: no, it is not required. It can be deleted.
I'll add the alternative captions. Thanks.
I added captions and alt texts (10a6a9e1b0c9fa794637837338bd7a61b7f6c5d7), but Nabeel should take a look whether it looks 'Englishly' enough...
Hello @rogorido and @nabsiddiqui,
here follows my preliminary feedback; I am aware it is quite extensive, but I believe these indications could help you strengthen your tutorial. If you need any clarification, please do not hesitate to ask!
In general, your tutorial provides valuable guidance on navigating and producing a wide range of visualisations, effectively walking through the various features of ggplot2
. The piece meets the accessibility and inclusivity goals of the Programming Historian fairly well, and in most cases the language is easy to understand and straightforward. However, some elements need further work, mostly falling under two intertwined aspects discussed in the following paragraphs.
In my opinion, this is the most critical point to consider. The tutorial lacks a cohesive element to tie its components together and the organisation of the content could benefit from a more linear and less convoluted approach. The case study you propose (sister cities) seems to be just a tool to obtain a series of visualisations. This is fair enough, but it could benefit from further methodological contextualisation and unpacking: the people following your tutorial may not be historians not have a clear understanding of the methods you are using -- although they can be familiar with R.
In terms of improving the overall content, I think there are two possible directions for you to consider: either revising the content to follow a visualisation task-based narrative or placing more emphasis on the structure of the case study. The first option would privilege the visualisation tasks (but still require some methodological support for the case study), while the second would require you to generate stronger and sharper research questions from the case study, to be answered (at least in part) by the visualisation tasks. I think @nabsiddiqui did a very good job of structuring the content in the lesson Data Wrangling and Management in R, so I would recommend keeping that in mind as a reference.
The title of the proposal could benefit from being more specific - or at least mentioning the context of application. The table of contents looks unbalanced: the headings and their actual wording could be better aligned with the content they cover, and the nesting could be more linear.
You give very clear information about the concept of the grammar of graphics - this is really the cornerstone of understanding how ggplot2
is designed. I really appreciate you explaining this and including many useful resources, although I think they could be arranged more organically, instead of including relatively short hints throughout the tutorial, as they tend to overshadow the walkthrough steps on several occasions.
The dataset looks more than adequate for the visualisation tasks you have set as objectives, but the data narrative and its wording could benefit from further tuning. What you offer in this lesson is mostly visualisation of data distributions and there is little statistical testing involved. As your topic is sister cities, it makes perfect sense to talk about relationships, although what you observe are mostly trends or tendencies that you could try to explain through further research; sometimes you clearly point that out and sometimes it looks rather implicit. I think this is just a matter of fine-tuning the language, nothing more.
Para stands for paragraph number; please refer to the preview generated by @anisa-hawes
readr
head(eudata)
could support your explanation about the observations occurring in the dataset – this is also considered good practice in data science.typecountry
column included in your dataset. I tested the walkthrough using the data contained in the eu
column, just remember to send us the correct version of the dataset.[ ] Para 31, penultimate line: comma missing space afterwards.
[ ] Para 33, please review this for clarity (here you should mention why you used log10 once for all or put it into another spot. Consider explaining why none of the methods is ideal)
This leads to an uninformative histogram. We can take
log10(dist)
as our variable or filter to exclude values above 5000kms. None of these methods is ideal, but as far as we know, we are operating with manipulated data making it less problematic
[ ] Para 36, please review it for clarity (it reads implicitly why you employed ECDF).
[ ] Para 41, same issue: you refer to ANOVA without explaining why you foresee that as a viable statistic test, cutting the paragraph short.
[ ] Review the heading accordingly with the edits.
ggplot2
does not use discrete colour scales at all.Two quick comments on the form and style.
ggplot2
that always comes lowercased, but you know it 😄)code format
or not, you choose. Consistency is the only requirement.Thank you for the great work done so far!
@semanticnoodles thanks for your extensive comments. I will have a look at the enhancements you're proposing in the next days.
Hello Igor @rogorido and Nabeel @nabsiddiqui. Your lesson has been moved to the next phase of our workflow which is Phase 3: Revision 1.
This Phase is an opportunity for you to revise your draft in response to @semanticnoodles's initial feedback. You can make direct commits to your file here: /en/drafts/originals/visualizing-data-with-r-and-ggplot2.md. @charlottejmc or I are here to help if you encounter any practical problems!
When both of you + Giulia are happy with the revised draft, we will move forward to Phase 4: Open Peer Review.
%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
} } }%%
timeline
Section Phase 2 <br> Initial Edit
Who worked on this? : Editor (@semanticnoodles)
All Phase 1 tasks completed? : Yes
Section Phase 3 <br> Revision 1
Who's working on this? : Authors (@rogorido + @nabsiddiqui)
Expected completion date? : May 17
Section Phase 4 <br> Open Peer Review
Who's responsible? : Reviewers (TBC)
Expected timeframe? : ~60 days after request is accepted
Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.
Hello Igor @rogorido and Nabeel @nabsiddiqui, I hope you are doing well!
Just checking in with you about the draft revision (Phase 3 / Revision 1) as the deadline of the 17th of May has passed. If you need some extra time let me know approximately how much, so we can set up a new deadline -- and @anisa-hawes or @charlottejmc can update the Mermaid timeframe.
If you have doubts or need any clarification, please do not hesitate to keep in touch.
Hello @semanticnoodles,
I have tried to rework a lot of the tutorial. I feel that changing some of the headings will make the flow more obvious. Let me see if it makes sense the way I have done it or if there should be additional changes. Here are some of what I reviewed based on your timeline. The rest I will leave to @rogorido unless he has an objection:
readr
head(eudata)
could support your explanation about the observations occurring in the dataset – this is also considered good practice in data science.typecountry
column included in your dataset. I tested the walkthrough using the data contained in the eu
column, just remember to send us the correct version of the dataset.[X] Para 31, penultimate line: comma missing space afterwards.
[X] Para 33, please review this for clarity (here you should mention why you used log10 once for all or put it into another spot. Consider explaining why none of the methods is ideal)
This leads to an uninformative histogram. We can take
log10(dist)
as our variable or filter to exclude values above 5000kms. None of these methods is ideal, but as far as we know, we are operating with manipulated data making it less problematic
[X] Para 36, please review it for clarity (it reads implicitly why you employed ECDF).
[X] Para 41, same issue: you refer to ANOVA without explaining why you foresee that as a viable statistic test, cutting the paragraph short.
[X] Review the heading accordingly with the edits.
ggplot2
does not use discrete colour scales at all.Two quick comments on the form and style.
ggplot2
that always comes lowercased, but you know it 😄)code format
or not, you choose. Consistency is the only requirement.Thank you, @nabsiddiqui!
@semanticnoodles will review these revisions and advise if we are ready to move onwards to the next Phase of the workflow (which will be Phase 4 Open Peer Review). Giulia is away this week, returning on June 3rd.
In the meantime, @charlottejmc and I can help with ensuring that functions and arguments are typographically consistent. These are aspects we always check as part of typesetting at Phase 6, but we'll do a quick scan now so that this isn't a distraction for Reviewers.
Hello @nabsiddiqui and @semanticnoodles,
I've made some adjustments to add backticks to functions, arguments and other parts of code, trying to stay consistent with our house style.
Programming Historian in English has received a proposal for a lesson, 'Visualizing data with R and ggplot2,' by @rogorido and @nabsiddiqui.
I have circulated this proposal for feedback within the English team. We have considered this proposal for:
We are pleased to have invited @rogorido and @nabsiddiqui to develop this Proposal into a Submission under the guidance of @semanticnoodles as editor.
The Submission package should include:
We ask @rogorido and @nabsiddiqui to share their Submission package with our Publishing team by email, copying in @semanticnoodles.
We've agreed a submission date of April. We ask @rogorido and @nabsiddiqui to contact us if they need to revise this deadline.
When the Submission package is received, our Publishing team will process the new lesson materials, and prepare a Preview of the initial draft. They will post a comment in this Issue to provide the locations of all key files, as well as a link to the Preview where contributors can read the lesson as the draft progresses.
If we have not received the Submission package by April, @semanticnoodles will attempt to contact @rogorido and @nabsiddiqui. If we do not receive any update, this Issue will be closed.
Our dedicated Ombudspersons are Ian Milligan (English), Silvia Gutiérrez De la Torre (español), Hélène Huet (français), and Luis Ferla (português) Please feel free to contact them at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudspersons will have no impact on the outcome of any peer review.