programminghistorian / ph-submissions

The repository and website hosting the peer review process for new Programming Historian lessons
http://programminghistorian.github.io/ph-submissions
139 stars 114 forks source link

Creating a Dashboard for Interactive Data Visualization with Dash in Python #609

Open hawc2 opened 8 months ago

hawc2 commented 8 months ago

Programming Historian in English has received a proposal for a lesson, 'Creating a Dashboard for Interactive Data Visualization with Dash in Python' by @hluling.

I have circulated this proposal for feedback within the English team. We have considered this proposal for:

We are pleased to have invited @hluling to develop this Proposal into a Submission to be developed under the guidance of @caiocmello as editor.

The Submission package should include:

We ask @hluling to share their Submission package with our Publishing team by email, copying in @caiocmello .

We've agreed a submission date of April. We ask @hluling to contact us if they need to revise this deadline.

When the Submission package is received, our Publishing team will process the new lesson materials, and prepare a Preview of the initial draft. They will post a comment in this Issue to provide the locations of all key files, as well as a link to the Preview where contributors can read the lesson as the draft progresses.

If we have not received the Submission package by April, @caiocmello will attempt to contact @hluling. If we do not receive any update, this Issue will be closed.

Our dedicated Ombudspersons are Ian Milligan (English), Silvia Gutiérrez De la Torre (español), Hélène Huet (français), and Luis Ferla (português) Please feel free to contact them at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudspersons will have no impact on the outcome of any peer review.

charlottejmc commented 7 months ago

Hello @caiocmello and @hluling,

You can find the key files here:

You can review a preview of the lesson here:


I do have a question about two .py files in the assets. As far as I can understand,

How come these scripts are provided separately, rather than included as code blocks within the lesson? (I am slightly confused about how these scripts differ from the main code, which you've collated together under app.py.)

Thank you for clarifying!

anisa-hawes commented 7 months ago

Thank you for processing these files, @charlottejmc!


Hello Luling @hluling,

What's happening now?

Your lesson has been moved to the next phase of our workflow which is Phase 2: Initial Edit.

In this Phase, your editor Caio @caiocmello will read your lesson, and provide some initial feedback. Caio will post feedback and suggestions as a comment in this Issue, so that you can revise your draft in the following Phase 3: Revision 1.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 1 <br> Submission
Who worked on this? : Publishing Assistant (@charlottejmc) 
All  Phase 1 tasks completed? : Yes
Section Phase 2 <br> Initial Edit
Who's working on this? : Editor (@caiocmello)  
Expected completion date? : May 17
Section Phase 3 <br> Revision 1
Who's responsible? : Author (@hluling) 
Expected timeframe? : ~30 days after feedback is received

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

hluling commented 7 months ago

Hello @caiocmello and @hluling,

You can find the key files here:

You can review a preview of the lesson here:

I do have a question about two .py files in the assets. As far as I can understand,

  • app-rq2.py is a script to download the data for Research Question 2 (RQ2)
  • rq2-download.py is a script showing how the dashboard was set up for RQ2

How come these scripts are provided separately, rather than included as code blocks within the lesson? (I am slightly confused about how these scripts differ from the main code, which you've collated together under app.py.)

Thank you for clarifying!

Thank you @charlottejmc. To clarify:

The two RQs are based on two different data sources. app.py has the code for RQ1, the two other .py files are for RQ2. The main procedure and logic is demonstrated with RQ1, so I thought it might be repetitive to explain it again with RQ2. But I'd be happy to incorporate the RQ2 code into the lesson main text if that makes more sense.

The reason to separate app-rq2.py from rq2-download.py is that it takes some time to retrieve data using the Chronicling America API (RQ2), so it's not practical to incorporate the download procedure into the dashboard script.

charlottejmc commented 7 months ago

Thank you @hluling, that makes good sense to me now. Anisa and I did find this slightly confusing upon initial processing of the lesson, so this might indicate it will be confusing to readers as well. One solution would be to keep the code in a separate asset folder, but give clearer instructions to readers explaining this choice.

I will let @caiocmello share his view on this too!

caiocmello commented 6 months ago

Dear @hluling,

It has been such a pleasure reading your lesson. I've learnt a lot from it and I'm sure it will be of great contribution to the PH! So, thanks very much for this! I took note of some suggestions I could provide you at this stage before it goes to external review. I hope they are useful in improving the accessibility and usability of this material. Comments below indicate the paragraph, as annotated in the preview version.

It was great to see that you included more than one research question in the lesson. Also, you provide a different set-up of the dashboard, showing how readers can customise it in different ways. This is excellent. I have, however, some suggestions regarding the way the RQs are structured in the text:

Final comment:

These are my initial suggestions and I look forward to hearing back from you. I hope this is useful and feel free to get in touch if you have any questions.

charlottejmc commented 6 months ago

Thank you very much @caiocmello – just a short note to let you and @hluling know that I've just taken care of switching the two asset links at paragraphs 39 and 41.

hluling commented 6 months ago

Thank you @caiocmello for the insightful feedback! I'm working on the edits. @charlottejmc: Thanks for changing the links! Do I just upload the updated materials to my original repo? I also want to insert figures, and I'm looking at the instructions described here. Am I supposed to add the .png files to my original repo? I'll refer to the figures in the main text.

anisa-hawes commented 6 months ago

Hello Luling @hluling,

If you'd like to slot in some figure images, please either upload them to your repository where we can download them or email to us as before. Charlotte and I will process these next week and put them in place for you!

Thank you, Anisa

anisa-hawes commented 6 months ago

What's happening now?

Hello Luling @hluling. Your lesson has been moved to the next phase of our workflow which is Phase 3: Revision 1.

This Phase is an opportunity for you to revise your draft in response to @caiocmello's initial feedback.

I've sent you an invitation to join us as an Outside Collaborator here on GitHub. This gives you the Write access you'll need to edit your lesson directly.

We ask authors to work on their own files with direct commits: we prefer you don't fork our repo, or use the Pull Request system to edit in ph-submissions. You can make direct commits to your file here: /en/drafts/originals/interactive-data-visualization-dashboard.md. Charlotte and I can help if you encounter any practical problems!

When you and Caio are both happy with the revised draft, we will move forward to Phase 4: Open Peer Review.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 2 <br> Initial Edit
Who worked on this? : Editor (@caiocmello) 
All  Phase 2 tasks completed? : Yes
Section Phase 3 <br> Revision 1
Who's working on this? : Author (@hluling)  
Expected completion date? : June 12
Section Phase 4 <br> Open Peer Review
Who's responsible? : Reviewers (TBC) 
Expected timeframe? : ~60 days after request is accepted

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

hluling commented 6 months ago

Hi @caiocmello, thanks again for the thorough review! Please see the revised lesson here: https://programminghistorian.github.io/ph-submissions/en/drafts/originals/interactive-data-visualization-dashboard Feel free to let me know if I need to change anything else. Here is the list responding to each of your comments.

  • [x] Paragraph 5: Although it is okay and, actually, recommended to keep research questions simple for this tutorial, I would suggest a slight change in the text to make it more accurate. The fact that the U.S. television stations mention words such as Putin and Zelensky in the same frequency doesn’t mean, necessarily, 'balanced coverage of the event'. Therefore, I would suggest avoiding the word ‘balanced’ by simply adapting the research question to something like: ‘...concerns how the U.S. television stations have covered the war in Ukraine. One way to address…’ (or explaining what you mean by 'balanced').

Revised as suggested (now in Paragraph 9).

  • [x] Paragraph 21: It would be interesting to add a line here to state the parameters you chose. Eg.: ‘For the purpose of this lesson, keywords chosen are x,y, z. The geographic market is x…’.

Revised as suggested (now in Paragraph 30).

  • [x] I think the reader would benefit from seeing a spoiler of the final product at the beginning of the lesson. It could be a screenshot of the dashboard (or even the link for the live demo version you provided (https://ph-dash-demo.onrender.com/). It would have helped me to understand what my goal was if I had seen it before starting the lesson. (Let me know what you think about this).

This is a great idea. I added Figure 1 and Figure 2 showing screenshots for the two dashboard.

  • [x] It would have also been helpful to know more about (to have a general overview of) the dataset from the beginning. This could be a screenshot, a table or 'schema'. Something that explains what is in there and how it is structured.

I agree. I added Figure 3 and Figure 4 showing screenshots for the two datasets.

  • [x] You present the two RQs at the beginning of the lesson. But it makes me (as a reader) feel like you forgot the second RQ along the text. In paragraph 17, for example, you say: 'To address the research question...', and I was confused whether you were talking about RQ1 or RQ2. Therefore, I would suggest mentioning at the beginning of the lesson that you will provide an 'extra' RQ for those interested in learning by building and customising different dashboards. This way, RQ2 would be presented just at the end of the text and framed as 'extra' (non-essential) content of this lesson.

I've revised and adjusted the language about the role of the two RQs (Paragraphs 4 and 27).

  • [x] Considering this lesson is focused on data visualisation, would it be possible to provide the data for RQ2, instead of providing the code to download it? I appreciate the exercise of collecting the data, but in this case, I think it took me a very long time to download it, when I was primarily focused on visualising. Also, if kept in the lesson, I think the script for downloading data for RQ2 would have to be explained in detail in the lesson, as it is not simple.

I've added a link to download the dataset directly (Paragraph 48).

  • [x] It would also be great to see a spoiler of the dashboard for RQ2. This way the reader can choose whether it is worth it to look at the 'extra' content.

The added Figure 2 shows a screenshot of the RQ2 dashboard.

hluling commented 6 months ago

Hi Anisa @anisa-hawes (thanks for the reply!) and @charlottejmc,

I've placed the 4 figures here: https://github.com/hluling/ph-dash/tree/master/interactive-data-visualization-dashboard. You can find the figure placeholders in the revised lesson draft: https://programminghistorian.github.io/ph-submissions/en/drafts/originals/interactive-data-visualization-dashboard

Also a quick note: I updated some files here: https://github.com/programminghistorian/ph-submissions/tree/gh-pages/assets/interactive-data-visualization-dashboard

charlottejmc commented 6 months ago

Thank you @hluling, I've uploaded your four images and updated the placeholder links in the markdown file.

caiocmello commented 6 months ago

Hi @hluling,

It looks great! Thanks for the rapid response and for your engagement in the process! @anisa-hawes and @charlottejmc will process the lesson to the next stage of external peer-reviewing. I will write to you soon once reviewers are assigned.

Best wishes, Caio

anisa-hawes commented 6 months ago

Hello Luling @hluling,

What's happening now?

Your lesson has been moved to the next phase of our workflow which is Phase 4: Open Peer Review. This Phase will be an opportunity for you to hear feedback from peers in the community.

Caio @caiocmello has invited two reviewers to read your lesson, test your code, and provide constructive feedback. In the spirit of openness, reviews will be posted as comments in this Issue (unless you specifically request a closed review).

After both reviews, Caio will summarise the suggestions to clarify your priorities in Phase 5: Revision 2.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 3 <br> Revision 1
Who worked on this? : Author (@hluling)
All  Phase 3 tasks completed? : Yes
Section Phase 4 <br> Open Peer Review
Who's working on this? : Diego Alves + Johannes Breuer
Expected completion date? : 22 July
Section Phase 5 <br> Revision 2
Who's responsible? : Author (@hluling)
Expected timeframe? : ~30 days after editor's summary

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

anisa-hawes commented 6 months ago

Hello Luling @hluling,

I noticed that you updated the Open in Colab button link: https://github.com/programminghistorian/ph-submissions/commit/e5c835080751bf4ed00606328bd507349159f112 but this was correct as we had it set up: https://colab.research.google.com/github/programminghistorian/ph-submissions/blob/gh-pages/assets/interactive-data-visualization-dashboard/interactive-data-visualization-dashboard.ipynb. We are hosting your Python notebook within our organisational Colab space and syncing this copy with the assets folder on our repo. If you want to make any edits or adjustments to the notebook, please coordinate with us. Either we can make the changes on your behalf, or we can add you as a co-editor on our Master copy of your notebook. We are not making any direct edits to the notebook here in the repo, rather we update the Master copy on Colab, then re-sync.

Thank you, Anisa

hluling commented 6 months ago

Thanks, @anisa-hawes. Sorry about that. I will let you know when there are changes.

charlottejmc commented 6 months ago

Hi @hluling,

I apologise for the confusion. When you updated the notebook in your Phase 3 Revision commit, we didn't realise that this had replaced the code behind the Open in Colab button. I do appreciate that you noticed and tried to rectify it later!

However, what we actually need is for the link to refer back to the notebook as hosted on our own GitHub repo: you can see that I've changed it back to link to: https://colab.research.google.com/github/programminghistorian/ph-submissions/blob/gh-pages/assets/interactive-data-visualization-dashboard/interactive-data-visualization-dashboard.ipynb, rather than: https://colab.research.google.com/github/hluling/ph-dash/blob/master/interactive-data-visualization-dashboard.ipynb (While you'd renamed the .ipynb file correctly here, the button was still linking back to your own GitHub repo.)

caiocmello commented 6 months ago

Open Peer Review

During Phases 2 and 3, I provided initial feedback on this lesson, then worked with @hluling to complete a first round of revisions.

In Phase 4 Open Peer Review, we invite feedback from others in our community.

Welcome Diego Alves @dfvalio and Johannes Breuer @jobreu. By participating in this peer review process, you are contributing to the creation of a useful and sustainable technical resource for the whole community. Thank you.

Please read the lesson, test the code, and post your review as a comment in this issue by July 22.

Reviewer Guidelines:

A preview of the lesson:

-- Notes:

Anti-Harassment Policy

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

Programming Historian in English is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutinise ideas, to ask questions, make suggestions, or request clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. If anyone witnesses or feels they have been the victim of the above described activity, please contact our ombudsperson Dr Ian Milligan. Thank you for helping us to create a safe space.

jobreu commented 4 months ago

Thanks a lot to @hluling for writing this tutorial and thanks to @caiocmello and @anisa-hawes for inviting and adding me, and giving me the opportunity to (p)review this tutorial!

First of all, I want to say that I enjoyed reading and testing this tutorial. It covers a very interesting and relevant topic and I find it very well-structured and helpful.

All of the suggestions that I have are rather minor and should be easy to address (I think/hope).

I will go through my comments in chronological order in the following:

As a a general remark: The tutorial text contains a few typos that warrant another thorough proofreading. I guess, this could be done using a tool like Grammarly, Writefull, Language Tool, or DeepL Write (by copying and pasting the text parts into the respective web or desktop apps). For example, there seem to be missing articles in the bullet points of paragraph 7: “… an Application Programming Interface (API)” and “… the dashboard…”

Going beyond the tutorial text, I also tested the Colab version of the Jupyter Notebook for RQ1 and ran the code and app locally and everything worked fine 😊 There is only one minor thing I noticed: When copying the app.py file from the GitHub repo and then running the app locally, the data displayed in the app was from 2023 and not from today up to 120 days ago. I fixed this by replacing the code for the queries in the app.py file with that from the interactive-data-visualization-dashboard.ipynb file. I also tested the deployment via Render and it worked nicely.

caiocmello commented 4 months ago

Thanks very much @jobreu for your enlightening and detailed review. This is very helpful and we appreciate that! I would like to ask the author @hluling to please wait until we receive feedback from the second reviewer @dfvalio, before you start editing or making changes to your lesson.

dfvalio commented 4 months ago

Hi everyone, First of all, I would like to thank @hluling for preparing this tutorial. Overall, it is well-structured and easy to follow. However, as it involves many different platforms, it can be a bit complicated for beginners.

Please find my comments below:

It is good that you provided the Jupyter notebook, it will certainly be useful for many users. I hope my comments will be a valid help to improve this lesson. Thanks again for inviting me as a reviewer.

I hope this helps in refining the tutorial further. Best regards,

Diego

caiocmello commented 4 months ago

Thanks very much @dfvalio for providing such thorough feedback on this piece!

Dear @hluling,

I've organised reviewers' feedback on this comment, combining them and adding some 'notes' that I hope can be helpful (My notes are just suggestions, though. Feel free to respond as you wish). Please, let me know if you need help with anything and feel free to ask questions to reviewers if you think something needs further clarification.

Best wishes, Caio

anisa-hawes commented 4 months ago

Many thanks for contributing as reviewers, Diego @dfvalio and Johannes @jobreu! We sincerely appreciate your generous participation.

And thank you @caiocmello for your thorough summary comment. I agree that it makes good sense to remove the bibliography in this case, as all works referenced are cited within the endnotes.

--

Hello Luling @hluling,

What's happening now?

Your lesson has been moved to the next phase of our workflow which is Phase 5: Revision 2.

This phase is an opportunity for you to revise your draft in response to the peer reviewers' feedback.

Caio @caiocmello has summarised their suggestions, but feel free to ask questions if you are unsure.

Please make revisions via direct commits to your file: /en/drafts/originals/interactive-data-visualization-dashboard.md. Charlotte and I are here to help if you encounter any difficulties.

When you and Editor are both happy with the revised draft, the Managing Editor @hawc2 will read it through before we move forward to Phase 6: Sustainability + Accessibility.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 4 <br> Open Peer Review
Who worked on this? : Reviewers (@dfvalio+ @jobreu)
All  Phase 4 tasks completed? : Yes
Section Phase 5 <br> Revision 2
Who's working on this? : Author (@hluling)
Expected completion date? : August 18
Section Phase 6 <br> Sustainability + Accessibility
Who's responsible? : Publishing Team
Expected timeframe? : 7~21 days

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

hluling commented 3 months ago

Hi Charlotte @charlottejmc, I need to replace Figure 2 in the lesson. Could you please help me with this? Here is the new version: interactive-data-visualization-dashboard2

Thank you @dfvalio and @jobreu for the detailed feedback! Thanks @caiocmello for combining the comments! I've revised the lesson and will address each of the comment below. When a revision is described, the added or edited texts are in italics.

  • [x] Paragraph 1: I think, I would remove the following sentence: “It would be beneficial for scholars to explore ways to better engage with a broader audience.”

I've removed the sentence.

  • [x] Paragraph 1: The advantages of providing a dashboard could be further developed. For example, the author could explain what sort of advanced analysis can be done with a dashboard compared to a simple non-interactive graph. Caio's note: After saying that it has become a popular method (line 4), you could perhaps mention that interactive dashboards allow users to explore, compare and interrogate data according to their particular interests.

Added the following: "Unlike static graphs, interactive dashboards allow readers to explore patterns in the data based on their specific interests by filtering, sorting, or changing data view. Features like hover-over tooltips can also provide additional information without cluttering the main display."

  • [x] Paragraphs 2 and 4: Maybe it would be helpful for readers to explicitly distinguish (in the explanatory/introductory text) between a contemporary/current research question/topic (RQ1) and a historical one (RQ2)?
  • [x] Paragraph 2: It would be better to clarify that the two research questions are independent.

Please see the following edits. Paragraph 2: "... this lesson is guided by two independent sample research questions in the field of media and communication studies, each representing a different temporal focus. The first research question (RQ1) is contemporary and asks:" Paragraph 4: "To broaden the application settings of the current lesson, I also provide a historical example with the second research question (RQ2):"

  • [x] Paragraph 5: I found the text and numbers in the “Screenshot of the RQ2 dashboard” a bit hard to read (even in the enlarged version). Could the font size or zoom factor maybe be increased here?
  • [x] Paragraph 5: The figure shows a comparison of two graphs from different decades, but there is no information regarding the analysis. What do the percentages mean? This information should be added somewhere in the text. Caio's note: You could extend the figures' legends to include that info. Eg. 'Screenshot of the RQ2 dashboard. The chart shows the percentage of ...'

I have increased the chart size and the font size, as shown in the new Figure 2 (please see above). In the caption, I have also added the following description “Each chart shows the top-10 non-English newspapers in a given decade. The percentage is the count of newspaper titles in a given non-English language divided by the sum of non-English newspaper titles.”

  • [x] Paragraph 6: In the last sentence, it may make sense to add that the created visualizations are interactive and that the deployment via a (free) web service makes them widely and easily accessible.

The following changes are made: “The approach taken by this lesson can be applied to a wide range of digital humanities projects where there is a need to retrieve data from a publicly available source, process and analyze the data, and visualize the research outputs in an interactive manner. In addition, this lesson also shows how to deploy the RQ1 dashboard via a freemium web service, which helps to make similar dashboards widely and easily accessible.

  • [x] Paragraph 9: When mentioning content analysis and algorithmic text analysis for the first time, I would suggest adding links to some resources with (further) explanations. For content analysis, I guess, this could simply be the Wikipedia entry. To be honest, I have not heard the term “algorithmic text analysis” (ATA) before. A quick web search brought up this handbook entry. Terms that I (as a quantitative social scientist) am more familiar with, are text mining, text as data, and natural language processing (NLP). Maybe it could be worthwhile to briefly state how ATA relates to text mining and NLP? (Could be a short footnote, I think).
  • [x] Paragraph 9: After the sentence "The quantitative method of content analysis (CA) has long been a tradition in mass communication studies," it would be helpful to provide some examples (references) of studies using this method. Additionally, I agree with @jobreu that the term ATA might be unfamiliar. Consider using a more standard term or providing a better explanation. Caio's note: Regarding the 'content analysis', I agree with Johannes that adding a reference for the Wikipedia entry would be sufficient. For ATA, there is consensus in reviewers' feedback that it needs a brief contextualisation. I also wonder how it differs from text mining. It can be a footnote, though!

ATA was used because it emphasized “algorithmic,” which I believe is a useful distinction from CA. I agree with including the more popular and general terms of text mining and NLP. In the revised paragraph, I’ve elaborated on (briefly) the difference between ATA and CA, included more references to CA and ATA, and referred to text mining and NLP when ATA is discussed:

“_Both methods aim to infer meanings from text through classification or measurement. Whereas CA relies heavily on a carefully crafted codebook based on research questions and multiple human coders to ensure the reliability and validity of a systematic analysis,[2] [3] ATA relies on algorithms and models (a more general term for this method is text mining or natural language processing).[4]_" In endnotes, here are the added references: [2]: Matthew Lombard, Jennifer Snyder‐Duch, and Cheryl Campanella Bracken. "Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability," Human Communication Research 28, no. 4 (2002): 587-604. [3]: Kimberly A. Neuendorf, The Content Analysis Guidebook (Thousand Oaks: Sage, 2017). [4]: Gross, Justin, and Dana Nestor. "Algorithmic Text Analysis: Toward More Careful Consideration of Qualitative Distinctions," in Oxford Handbook of Engaged Methodological Pluralism in Political Science, eds. Janet M. Box-Steffensmeier, Dino P. Christenson, and Valeria Sinclair-Chapman (Oxford Academic, 2023).”

  • [x] Paragraphs 10 to 13: I recommend adding information about the specific data used for RQ1 in the dataset description. While this information appears later in the text, defining the keywords and timeframe in this part would be beneficial.

I added the following text: “Our goal is to retrieve the relevant data for RQ1 via the 2.0 TV API. Regarding keyword, some appropriate Ukraine-related terms can include "Ukrainian" and "Zelenskyy," and the Russia-related terms can include "Russian" and "Putin." With the 2.0 TV API, we also specify the TV geographic market to be "National;" the output mode is the normalized percentage of airtime (the y-axis of the line graph that we will create later); and the time range covers the last 365 days, including today. After data retrieval, we will prepare a dataset like this for visualization.”

  • [x] Paragraph 14: Clarify what type of change will be measured. Adding a sentence explaining the focus of RQ would enhance clarity. Caio's note: It seems like the dashboard for RQ2 just provides info on the ranking of languages with most newspapers published in the US. Therefore, you could slightly change RQ2 to something like: 'How has the ranking of top non-English languages of newspapers published in the United States changed dating back to the 1960s'. The ranking changed, not the languages, right?

I made the following change to clarify: “How has the ranking of top non-English languages of American newspapers changed from the 1690s to the present? Specifically, the dashboard will be designed to show the top ten languages for each decade dating back to the 1690s, highlighting any shifts in their rankings and the emergence or decline of different languages over time.” Upon further check, I decided to use “American newspapers” in RQ2 to make it more precise to describe the pre-1776 period. The same change has been made in the Introduction. In the same paragraph, upon further check, I’ve also included a brief description on non-English Native American newspapers and the relevant references.

  • [x] Paragraph 17: Include more information about the timeframe used in the tutorial. Additionally, clarify what "percentage of newspaper" refers to. Percentage regarding what exactly?

I’ve revised the paragraph to improve clarity: “In Figure 4, the rows represent languages, the columns represent decades (from the 1690s to the 2020s), and the cells represent counts of newspaper. We can use the cell values to calculate the percentage for a given newspaper language in a certain decade. The percentage is calculated by dividing the number of newspapers for a given language in a certain decade by the total number of non-English newspapers in that decade, and then multiplying by 100. This gives the proportion of newspapers for that language relative to all non-English newspapers in the same decade. Then, we can visualize what the top 10 non-English newspapers are in a certain decade.”

  • [x] Paragraph 18: I think, the link to Flask should rather be this one.

I’ve changed the URL to https://flask.palletsprojects.com. Opening this link will redirect to the latest version.

  • [x] Paragraph 20: There also is a GitHub CLI tool that could be added as an option for working with/via the command line in the 6th bullet point (as an alternative to GitHub desktop).
  • [x] Paragraph 20: Maybe add a link to the Project Jupyter website when mentioning Jupyter Notebook for the 1st time in the “Optional” note at the end of this paragraph?

I’ve added this and Codespaces: “Have git ready to use in command line. _You could also use either of the following (not covered in this lesson):

  • [x] Paragraph 21: If this lesson is aimed at people with some programming experience, the text is fine. For a less experienced audience, consider explaining (in a footnote) what a virtual environment is and why it is a good practice.

I’ve added a footnote: “[11] A virtual environment in Python is a self-contained directory that contains a specific version of Python and a set of libraries. It allows you to manage dependencies for different projects separately, ensuring that changes in one project do not affect others. This is especially useful for maintaining consistent development environments and avoiding conflicts between package versions.”

  • [x] Paragraph 27: Since this part of the lesson focuses on RQ1, I would remove the sentence related to RQ2.

To improve clarity, I’ve removed the original section called “An Idea of a Simple Dashboard.” Its content related to RQ1 has been moved under “Coding the Dashboards -> RQ1,” and the content related to RQ2 has been moved under “Coding the Dashboards -> RQ2.”

I’ve updated the link.

  • [x] Paragraphs 29 and 48: I think it would be helpful to repeat the RQs in a shortened form in the (sub-)headings “RQ1” and “RQ2” at the respective beginnings of paragraphs 29 and 48. Suggestions: “RQ1 – TV airtime”, “RQ2 – Non-English newspapers” (or something similar). As a related sidenote: For the sake of consistency, I would suggest sticking with Arabic numerals and change the headings “Research Question I” (para. 9) and “Research Question II” (para. 14) to “Research Question 1” and “Research Question 2”.

The suggested changes have been made.

  • [x] After paragraph 30 and before the code: Add a sentence or title explaining what will be done next, similar to previous sections. This applies to the two other code lines after paragraph 31 as well.
  • [x] Paragraph 32: Code: Take a look at the retrieved dataframe df_ukr.head() When using a .py script, the line should be print(df_ukr.head()) for it to be displayed in the command prompt. Also, the comment (#) could be clarified: "Display the first 5 rows of the data collected regarding the Ukrainian keywords." Additionally, you can mention that people can use .shape to check the size of the dataset.

The paragraphs under question have been revised: “Next, once we have the data retrieved, we need to prepare the data in a way that is ready for visualization. Our goal is to transform the data into the shape shown in Figure 3, above.

def to_df(queryurl):
  response = requests.get(queryurl)
  content_text = StringIO(response.content.decode('utf-8'))
  df = pd.read_csv(content_text)
  return df

Code explanation: The requests library is used to execute the queries and transform the results into a pandas dataframe. To do this, we create a function called to_df() to streamline the workflow. Once we have the function created, we can now put it to work:

df_ukr = to_df(query_url_ukr)
df_rus = to_df(query_url_rus)

Optional: You can use the df.head() function to take a look at the first five rows of the output dataframe from the above action.

# If in Jupyter: Take a look at the first five rows of the retrieved dataframe for Ukraine 
df_ukr.head()
# If you execute a .py file, add the print() function to see the first five rows
print(df_ukr.head())
# You can also use the shape() function to see how many columns and rows there are in the dataframe. Give it a try!

  • [x] Paragraph 33: The links to the Bootstrap resources did not work for me (server not found error). Caio's note: It worked for me. That's weird

I’m able to open the links. Maybe their site was down when you tested it?

  • [x] Paragraph 39: Especially for the local use/testing, it would be helpful to add the code for stopping the app here as well, I would say.
  • [x] Paragraph 39: Remind less experienced users not to close the command prompt while using the dashboard on the web browser.

I’ve added the following: “Do not close the command line program when the server is running. When you are done, in command line, press ctrl+c on keyboard to stop the server.

  • [x] Paragraph 46: “Add Environment Variable” was not in Advanced Settings but before it.

I’ve made the revision: “Second, scroll down and find the section called "Environment Variables."

  • [x] Paragraphs 48 & 49: I fully understand that the author cannot provide “a detailed explanation considering the space limit” (para. 28). However, I think, it would be helpful to at least briefly sketch how the steps that the tutorial goes through for RQ1 could be repeated for RQ2 in 2 or 3 sentences (e.g., create new venv, run .py script + create GH repo and deploy with Blender or some other service, if readers want to test deployment as well for this).

I’ve added the following sentences for RQ2: “Regarding workflow, the following steps will be the same as described above: the same prerequisites will be needed; follow the same steps to create a new virtual environment; the same Python libraries will be needed; and you can follow the same steps to deploy the RQ2 dashboard on Render. The data downloading procedure and the specific code used for the RQ2 dashboard will be different from RQ1. However, the underlying logic is the same: We start with data retrieval, prepare the data for visualization, code the dashboard frontend, then code the dashboard backend. I will briefly describe these in the next two sections.”

  • [x] Paragraphs 51 to 60 (Bibliography + Endnotes): I was a bit confused by the repetition of the references under the “Bibliography” heading and the endnotes. I am not sure whether this is standard practice at Programming Historian but would feel that removing the “Bibliography” section would be ok, given that the references are all provided in the endnotes. Caio's note: May I ask @anisa-hawes and @charlottejmc to clarify this last point on the bibliography?

Per Anisa’s reply, I deleted the bibliography.

I hope that I have addressed the comments successfully. Thank you all again!

charlottejmc commented 2 months ago

Hi @hluling,

Just to let you know I've replaced Figure 2 with the new version! Thanks so much for all your energy with this.

charlottejmc commented 2 months ago

Dear @caiocmello,

Once you've had a look through @hluling's Phase 5 Revisions and have confirmed you're happy with how the lesson now reads, please do let us know! We can then move to the next phase of the workflow, which is Phase 6: Sustainability & Accessibility. I'll begin with copyediting, before typesetting and the final checks.

Thank you! ✨

charlottejmc commented 2 months ago

Hella @hawc2, if @caiocmello is indeed happy with the lesson at this stage, then it is ready for your final read-through before we move it to Phase 6. Thank you!

caiocmello commented 2 months ago

Dear @charlottejmc, I have revised this draft and I am happy for it to move on to Phase 6. Thanks very much @hluling for your work on this piece and for your careful editions, following the recommendations of reviewers.

anisa-hawes commented 2 months ago

Thank you, @caiocmello.

Hello Luling @hluling,

The Managing Editor Alex @hawc2 will also read the lesson to confirm if it should be moved onwards to Phase 6, or if he'd like to suggest some further revisions.

Thank you, Anisa

hawc2 commented 2 months ago

Hi @hluling, my apologies for delays getting back to you with my review - the start of the semester swept things away.

This is looking really interesting, but there are some sections of the code that seem to be too briefly described. There’s a few changes I’d recommend making to the framing and presentation of key details. I’ve made a round of line-edits where you can see some of the simple changes I made to the table of contents, headings, and the wording of some sections.

Right now the intro section focuses on showing the final dashboard images the lesson will produce. But this information would be better to save for the end of the lesson. Instead, spend more time in the intro discussing the datasets and general types of data visualizations you will produce. It’s not clear in the intro what kind of graph will be viewable in the end, and it’s not concrete what the publicly available data sources are that you selected to answer your research questions. You end up going into detail about that below, but you might as well say concretely what they are the first time you mention them.

This is a difficult lesson that requires some base knowledge of Python and deploying code, and it may be helpful to add some more prerequisite lessons and helpful references/guidelines for some of those complicated steps (like running Python in the command line when you first mention it).

All the sections that begin “Code explanation” shouldn’t include that phrase, they should just actually describe the code. These sections could all afford to include more commentary about the code. Ideally you have commentary before and after the code, so instead of just citing code and then providing one paragraph describing the code, weave the commentary before and after the code. Ideally code chunks are less long, and there’s more detail per each chunk of code. You do say that for Q2 you don't provide commentary because it's similar to Q1, but you should still provide some commentary to direct the reader.

Please take a look at most of our published lessons, and you will get a better idea of the custom for citing and discussing code.

Once you've made those revisions, I can take a final look over it for any other lingering questions relating to the overall structure and last section. Some of the formatting and related presentation of information will continue to get revised after we send this on for copyedits and for the publishing manager to review. Please let me know if you have questions!

hawc2 commented 2 months ago

Looking more at your lesson, @hluling, I'm not sure it's even worth keeping RQ2. You give so little time to walking through it, I feel like it's a distraction, and it might be easier to just have 1 research question this whole lesson focuses on? It is already quite difficult, and tracking two different Research Questions and datasets is alot to follow

hawc2 commented 2 months ago

@hluling as an alternate solution, I think what would be best is to keep the content but change the framing. I would move away from the heading sections being titled Research Question 1 and 2, and actually give those sections content titles that refer to what the section is about.

You could reorganize the essay so it is less about two research questions, and more about, the second part serving as an extension of the first part to show extended features of the tool through a secondary research question. That research question and the data could be brought later, after you've shown everything for what you've currently labeled RQ1. That RQ1 is the majority of the whole lesson, so it doesn't make sense to divide it up as if it's a part 1 of a two part lesson. The RQ2 section could be converted into a discussion of extensions of Dash using a secondary research question and dataset that you introduce when those features come up.

The conclusion could use more discussion, tying together all these loose threads. What are the key methodologies you are teaching for people to advance their research and its presentation? Can you resummarize and elaborate on those here. Feel free to take a few paragraphs to wrap up.

I made substantial line-edits to some areas to break up long paragraphs and show you how code should have comments woven in. In a few places I added comments in brackets where I'm asking you to elaborate more on the code. Generally avoid comments inline of code when it can serve as preface or subsequent explanation after code.

Lots of our recent lessons will offer good examples of how to write more extended discussions of code. For example, when you cite a pandas dataframe for the first time, explain what it is, or provide a link to a secondary resource (wikipedia is fine). That kind of detail will really help guide the reader who isn't already pretty familiar with this type of work. If you can explain in more detail what a code is doing, what the purpose of each function is to that step in the process, that would be helpful, even if it's a brief sentence or two per line/chunk of code.

The later sections on deployment, especially, are going to be difficult for people. Running Python code is already not the most user-friendly, so using Github and Render are going to need alot of helping hands I'd think. We will label this lesson the highest difficulty. But if you can find more ways to provide guidelines and tips, please do. In a similar way, you could make the Jupyter notebook available as a functional Colab notebook for ease of running. The more you can explain, even just in a few sentences, how to run Python code in the command line, as well, the more likelihood people will be able to use your lesson.

Finally, I'd recommend adding more images. As I said before, ideally you don't show the end result until the end of the lesson. I replicated the images from the front near where they should appear at the end. But they shouldn't appear twice. Could the first ones be just one general image pulled from dash's website showcasing what the tool can do? Ideally you have more images throughout the lesson, not all frontloaded in the first half. So take a moment to consider what other images can you include earlier to give the reader a sense of the methodologies you will be teaching to advance their research?

anisa-hawes commented 2 months ago

Hello Luling @hluling,

What's happening now?

Your lesson remains in Phase 5: Revision 2. The Managing Editor Alex @hawc2 has made some additional suggestions following Caio's @caiocmello. Feel free to ask Alex or Caio further questions if you are unsure.

You can make revisions via direct commits to your file: /en/drafts/originals/interactive-data-visualization-dashboard.md. And, as always, @charlottejmc and I are here to help if you encounter any difficulties.

I've plotted in the revised timeframes for this Phase:

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 4 <br> Open Peer Review
Who worked on this? : Reviewers (@dfvalio+ @jobreu)
All  Phase 4 tasks completed? : Yes
Section Phase 5 <br> Revision 2
Who's working on this? : Author (@hluling)
Expected completion date? : November 2
Section Phase 6 <br> Sustainability + Accessibility
Who's responsible? : Publishing Team
Expected timeframe? : 7~21 days

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

hluling commented 1 month ago

Thanks, Alex @hawc2 for the detailed feedback and edits! I'll work on the revision.

anisa-hawes commented 1 month ago

Thank you for your email, Luling @hluling. To confirm, we're looking forwards to receiving your revisions in early November. I've adjusted the timeframes above. Please don't hesitate to write to us if you have questions in the meantime 🙂

hluling commented 3 weeks ago

Hi Alex @hawc2, thanks again for the comments and edits. I've revised the lesson based on your feedback. Here is a summary of the revisions:

  1. All languages about "RQ1" and "RQ2" have been removed. The main section is now called a case study. The second dashboard is now called an extended case.
  2. In the introduction, I've added a screenshot showing the kind of dashboard that can be created using Dash (now Figure 1). For the TV airtime dashboard, I've added a screenshot of the date-range picker (now Figure 3).
  3. I've added the explanation about why some of the Python libraries are needed. This is now under the section called "Import Libraries."
  4. I've broken down those two long blocks of code and provided much more detailed explanation for each of the smaller pieces following the suggested format. See para. 57-68, and para. 72-80.
  5. Wikipedia links for pandas and requests have been added (after para. 29)
  6. There is a Colab button at the top of the provided Jupyter Notebook. Now this is explicitly mentioned in the lesson (para. 31).
  7. The conclusion has been expanded, including a summary of lesson goals and takeaways (para. 124-133).
  8. In the paragraph following the section called "Prepare for the lesson," I've included the instruction on how to run a python file in command line (para. 21).

I hope that I have addressed the comments successfully. Please let me know if any further revision is needed.

anisa-hawes commented 3 weeks ago

Thank you, Luling @hluling. We appreciate your work on these revisions.

--

Hello Alex @hawc2,

Please let us know when you've had time to review these revisions. When you and @caiocmello both feel satisfied with the lesson, Charlotte & I will be pleased to support the next Phase 6.

hawc2 commented 3 weeks ago

@hluling thank you for these thorough revisions, the lesson is much improved. It's an excellent exploration of building dashboards. @caiocmello could you look this over one last time and share any thoughts for revision before we send this to copyedits?

My only immediate thought for revision is mostly something we can address in copyedits: some of the section headings aren't very explanatory of what that section is like, such as down in the Extended Use Case area where there's a heading that's just Dataset.

I still plan to test the Colab notebook, make sure the code works still as expected, and come back separately if there are any changes I'd recommend for it. We can finalize the notebook after copyedits.

caiocmello commented 2 weeks ago

Thanks @hawc2 for your comments and @hluling for your excellent revisions on the document. I think the way you managed to present RQ2 now looks much better and easier to follow. I've read the lesson through again and I think it reads well! I don't have anything to add at this point.

anisa-hawes commented 1 week ago

Hello Luling @hluling,

What's happening now?

Your lesson has been moved to the next phase of our workflow which is Phase 6: Sustainability + Accessibility.

In this phase, our publishing team will coordinate a series of tasks including: copyediting, typesetting, generating archival links, collating copyright agreements, and reviewing essential metadata.

Please note that you won't have direct access to make further edits to your files during Phase 6. You will have an opportunity review and discuss suggested copyedits with your editor @caiocmello. Thank you for your understanding.

When our Sustainability + Accessibility actions are complete, the Managing Editor @hawc2 will read the lesson through one final time ahead of publication.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 5 <br> Revision 2
Who worked on this? : Author (@hluling)
All  Phase 5 tasks completed? : Yes
Section Phase 6 <br> Sustainability + Accessibility
Who's working on this? : Publishing Team
Expected completion date? : ~21 days
Section Phase 7 <br> Publication
Who's responsible? : Managing Editor @hawc2
Expected timeframe? : ~10 days

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

charlottejmc commented 1 week ago

Hello @hluling,

This lesson is now with me for copyediting. I aim to complete the work by ~21 November.

Please note that you won't have direct access to make further edits to your files during this phase.

Any further revisions can be discussed with your editor @caiocmello after copyedits are complete.

Thank you for your understanding.

charlottejmc commented 3 days ago

Hello @hluling and @caiocmello, I've prepared a PR with the copyedits for your review.

There, you'll be able to review the 'rich-diff' to see my edits in detail. You'll also find brief instructions for how to reply to any questions or comments which came up during the copyedit.

When you're both happy, we can merge in the PR.