programminghistorian / ph-submissions

The repository and website hosting the peer review process for new Programming Historian lessons
http://programminghistorian.github.io/ph-submissions
138 stars 113 forks source link

Timeline summarization for large-scale past-web events with Python #618

Open hawc2 opened 4 months ago

hawc2 commented 4 months ago

Programming Historian in English has received a proposal for a translation from Portuguese, with the provisional title 'Timeline summarization for large-scale past-web events with Python: the case of Arquivo.pt' by @dcgomes and @rncampos.

I have circulated this proposal for feedback within the English team. We have considered this proposal for:

We are pleased that @dcgomes and @rncampos have developed this Proposal into a Submission to be developed under the guidance of @caiocmello as editor.

@dcgomes and @rncampos have shared their Submission package with our Publishing team by email. Our Publishing team will now process the new translation materials, and prepare a Preview of the initial draft. They will post a comment in this Issue to provide the locations of all key files, as well as a link to the Preview where contributors can read the lesson as the draft progresses.

Our dedicated Ombudspersons are Ian Milligan (English), Silvia Gutiérrez De la Torre (español), Hélène Huet (français), and Luis Ferla (português). Please feel free to contact them at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudspersons will have no impact on the outcome of any peer review.

charlottejmc commented 4 months ago

Hello @caiocmello, @dcgomes and @rncampos

You can find the key files here:

You can review a preview of the lesson here:


There are a couple small things which I noticed when processing this submission, outlined below. @dcgomes and @rncampos, I would be grateful if you could address them at this stage of publication (Phase 1: Submission).

Thank you very much to all ✨

Charlotte

anisa-hawes commented 4 months ago

Thank you for processing the files and setting up the preview, @charlottejmc! ✨

--

Olá Daniel @dcgomes and Ricardo @rncampos,

Thank you for your work on this translation!

One further note is that we don't yet have English translations of the alt-text or captions for the figure images. The Portuguese original text is as follows:

  1. alt="Pesquisa por Jorge Sampaio através do componente narrativa do Arquivo.pt" caption="Figura 1: Pesquisa por 'Jorge Sampaio' através da componente narrativa do Arquivo.pt."
  2. alt="Resultados da pesquisa por Jorge Sampaio no Conta-me Histórias para o periodo compreendido entre 07/04/2016 e 17/11/2016" caption="Figura 2: Resultados da pesquisa por 'Jorge Sampaio' no *Conta-me Histórias* para o periodo compreendido entre 2016-04-07 e 2016-11-17."
  3. alt="Jorge Sampaio formaliza apoio a Sampaio da Nóvoa" caption="Figura 3: Jorge Sampaio formaliza apoio a Sampaio da Nóvoa."
  4. alt="Nuvem de palavras com os termos relacionados com a pesquisa Jorge Sampaio ao longo de 10 anos" caption="Figura 4: Nuvem de palavras com os termos relacionados com a pesquisa por 'Jorge Sampaio' ao longo de 10 anos."

Could you share this with Charlotte and I (either as a comment here in the Issue, or by email)?

Thank you, Anisa

dcgomes commented 3 months ago

Hi here go my answers. Please let me now if there is anything else missing on our side. Thanks.

OK, I agree. Please set up the colab notebook as you feel more adequate for the target audience.

Yes.

OK, I agree.

Done. 3 new images sent to publishing.assistant[@]programminghistorian.org.

Fixed.

I am waiting @rcampos answer on this matter.

One further note is that we don't yet have English translations of the alt-text or captions for the figure images. The Portuguese original text is as follows:

alt="Search for 'Jorge Sampaio' using the Narrative component of Arquivo.pt" caption="Figura 1: Search for 'Jorge Sampaio' using the Narrative component of Arquivo.pt."

alt="Search results for 'Jorge Sampaio' on Conta-me Histórias (Tell me Stories)" caption="Figura 2: Search results for 'Jorge Sampaio' on Conta-me Histórias (Tell me Stories)."

alt="Web-archived news page linked from the Conta-me Histórias search results." caption="Figura 3: Web-archived news page linked from the Conta-me Histórias search results."

alt="Word cloud with terms related to Jorge Sampaio research over 10 years" caption="Figura 4: Word cloud with terms related to Jorge Sampaio research over 10 years."

charlottejmc commented 3 months ago

Hello @dcgomes,

Thank you for sending over the replacement images, and translating the alt text and captions. I've now made these updates for your lesson.

anisa-hawes commented 3 months ago

Hello Daniel @dcgomes and Ricardo @rncampos (So lovely to see you in Lisboa last week, @rncampos!),

What's happening now?

Your lesson has been moved to the next phase of our workflow which is Phase 2: Initial Edit.

In this phase, your editor Caio @caiocmello will read your lesson, and provide some initial feedback. Caio will post feedback and suggestions as a comment in this issue, so that you can revise your draft in the following phase (Phase 3: Revision 1).

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 1 <br> Submission
Who worked on this? : Publishing Assistant (@charlottejmc) 
All  Phase 1 tasks completed? : Yes
Section Phase 2 <br> Initial Edit
Who's working on this? : Editor (@caiocmello)  
Expected completion date? : July 26
Section Phase 3 <br> Revision 1
Who's responsible? : Author (@author) 
Expected timeframe? : ~30 days after feedback is received

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

caiocmello commented 3 months ago

Dear authors @rncampos and @dcgomes,

Thanks very much for your interest in publishing a translation of your lesson with the Programming Historian in English and for your work on this piece. As a first step of this editorial process, I have read your original lesson published in Portuguese, as well as the translated version into English to come up with suggestions to improve readability and accessibility. By reading this lesson, I have encountered some issues that make it difficult to understand and follow. As we believe translations can also be an opportunity for refinement, I would like to share with you some of the questions that came up during this initial phase. I understand that this represents some substantial work. Is this something you feel you have capacity to address within the coming months? We can be flexible with our timeline for author revisions, which is usually ~1 month. If you do have the interest and capacity, I'd be happy to provide you with more focused line edits to support that work.

Suggestions:

  1. Title

I have the impression that the title could match better what the lesson does. The current title ‘Timeline summarisation for large-scale past-web events with Python’, gives the impression that the lesson will focus on how to create a timeline for summarising events. However, the project ‘Tell me Stories’, which is about summarisation, just starts to be addressed from Paragraph 42. Until then, the lesson is focused on the web archive API and its use for data retrieval. My sense is that a title such as ‘An introduction to the Portuguese Web Archive’s API for data retrieval’ would be clearer to readers who are using this lesson.

  1. Concepts definition

I’m not sure if I understood correctly what you mean by timeline summarisation (or sumarização de narrativas, in Portuguese). Please, correct me if I am wrong, but it seems like by using the term ‘summarisation’ you mean selecting the relevant news articles based on a given topic. Is this correct? If so, I think it would be important to define some key concepts used in this lesson (especially considering an interdisciplinary audience). Besides 'summarisation', another concept is ‘timeline’. In Paragraph 3 (line 4) it says: ‘In this context, timelines (automatic temporal summarisation systems)’. Is ‘automatic temporal summarisation systems’ the definition of timeline? It would be nice if you could clarify the origin of this term (if it is used in some specific context) and what it means.

  1. Tell Me Stories

It feels to me like the lesson could benefit from more detailed information about this tool. You mention in Paragraph 52, the use of a tool called YAKE! to determine the relevance of a news article based on a given topic. What are the main mechanisms used by the tool to determine relevance? Just a very brief comment on that would be helpful. I'm unclear about how ‘related terms’ are defined in this context. Are they the result of NER (meaning entities mentioned in the articles considered as relevant)?

  1. Videos

I noticed that you've included a sequence of links to videos. Although these are interesting further resources, they are not essential for understanding the content of the lesson. I think they work well as extra content and I would suggest placing them as a list of ‘references’ or 'further resources' at the end of the lesson.

I look forward to hearing your reflections on this feedback. Please have a think about whether you feel you have capacity to work through these adaptations for the English version of your lesson in the coming months. To reiterate, I'd be happy to share some more detailed feedback and line edits to support your work to make this translation accessible to a broader (multilingual and interdisciplinary) audience.

anisa-hawes commented 3 months ago

Hello Daniel @dcgomes and Ricardo @rncampos ,

What's happening now?

Your lesson has been moved to the next phase of our workflow which is Phase 3: Revision 1.

This phase is an opportunity for you to revise your draft in response to @caiocmello's initial feedback.

Ricardo @rncampos, I've sent you an invitation to join us as an Outside Collaborator here on GitHub. This will give you the 'write access' you'll need to edit your lesson directly. Daniel @dcgomes, I've checked to ensure that you have the 'write access' you'll need to edit the draft directly.

We ask authors to work on their own files with direct commits: we prefer you don't fork our repo, or use the Pull Request system to edit in ph-submissions. You can make direct commits to your file here: /en/drafts/translations/timeline-summarization-web-python.md. @charlottejmc and I can help if you encounter any practical problems!

When you and Caio are all happy with the revised draft, we will move forward to Phase 4: Open Peer Review.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 2 <br> Initial Edit
Who worked on this? : Editor (@caiocmello) 
All  Phase 2 tasks completed? : Yes
Section Phase 3 <br> Revision 1
Who's working on this? : Authors (@dcgomes + @rncampos)  
Expected completion date? : Aug 3
Section Phase 4 <br> Open Peer Review
Who's responsible? : Reviewers (TBC) 
Expected timeframe? : ~60 days after request is accepted

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

caiocmello commented 2 months ago

Hello Daniel @dcgomes and Ricardo @rncampos,

Have you had a chance to consider my initial feedback on your translation?

Please let us know your thoughts, and whether you feel you have capacity to work on these adjustments for the English version of your lesson in the coming months.

Thank you.

hawc2 commented 1 day ago

Hi @dcgomes and @rncampos, as Managing Editor, I'm stepping in at this point to help progress this ticket. Since we haven't heard from you since June, I believe the best course of action would be for this ticket to be closed. This lesson was accepted outside of our normal submission process because we hoped it would move forward quickly, but that doesn't seem to be the case.

There are some substantial revisions for you to consider. If you can address those changes, we'd encourage you to resubmit this lesson in our future call that will be going out in the next few weeks, with a deadline of early next year.

If the revisions we've requested are taken into consideration and fully implemented, we can aim to prioritize editing and hopefully publishing your lesson next year. Please let us know if you have any thoughts on next steps, and if we don't hear from you in the next couple weeks, we will be closing this ticket. If you have further questions or thoughts, please feel free to reach out to me at english@programminghistorian.org.