programminghistorian / ph-submissions

The repository and website hosting the peer review process for new Programming Historian lessons
http://programminghistorian.github.io/ph-submissions
137 stars 111 forks source link

Understanding and Creating Word Embeddings #555

Closed yann-ryan closed 7 months ago

yann-ryan commented 1 year ago

The Programming Historian has received the following tutorial on 'Understanding and Creating Word Embeddings' by @blaak-18, @quinnanya, and @saraheconnell. This lesson is now under review and can be read at:

https://programminghistorian.github.io/ph-submissions/en/drafts/originals/understanding-creating-word-embeddings

Please feel free to use the line numbers provided on the preview if that helps with anchoring your comments, although you can structure your review as you see fit.

I will act as editor for the review process. My role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum. I have already read through the lesson and provided feedback, which I include as a comment below this.

Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.

I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me.

Our dedicated Ombudsperson is (Ian Milligan - http://programminghistorian.org/en/project-team). Please feel free to contact him at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudsperson will have no impact on the outcome of any peer review.

Anti-Harassment Policy

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

The Programming Historian is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutinize ideas, to ask questions, make suggestions, or to requests for clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. Thank you for helping us to create a safe space.

yann-ryan commented 1 year ago

Initial feedback:

Thank you for this contribution - I really enjoyed this lesson which explains the basics of word embeddings very clearly. It is overall extremely well written and easy to follow, and I think it is going to make a fantastic contribution to the Programming Historian. The lesson is in very good shape and almost ready to proceed to peer review. I have a few small suggestions in advance of that. If you’re happy with them (and please feel free to query!), the next stage would be to make the changes (you can do so directly to the draft), and I will being the search for peer reviewers.

Some general points:

To me a strength of this lesson is that the code feels of secondary importance - much of the value I took from it was the discussion about the theory behind word embeddings, how to interpret them, and how to prepare a corpus for them, etc. With this in mind, I think it could be signposted early in the lesson so that the reader knows what to expect - that the coding steps will be quite minimal in comparison to the discussion.

I thought that some information could have been presented in a different sequence, alongside the practical parts of the lesson where possible. There were some places where as a reader I was unsure why I was being provided with certain information. Paragraph 20 goes into some detail with specific advice on how to mitigate against getting slightly different results from the Word2Vec model- at this point in the lesson, I’m not sure if that will be useful. It may be better moving it to the appropriate section (once the reader has run the model), or giving it a separate box so as not to break the flow of the lesson.

Another general suggestion is that I would advise reducing the information you provide in comments in code blocks and incorporate as much of it as possible within the lesson main text.

Some specific points:

Small errors/typos:

Can I ask you to propose a timeframe for carrying out the initial edits? The submission file above can be edited directly. In the meantime, I will find two peer reviewers.

Thanks so much!

Yann.

yann-ryan commented 1 year ago

The first reviewer for this lesson will be @rubenros1795, who aims to complete his review by late August. Thanks Ruben, let me know if you have any questions!

yann-ryan commented 1 year ago

Small update: second reviewer is also confirmed, I'll tag their Github username here over the next week or so.

blaak-18 commented 1 year ago

Paragraph 5: It’s not totally clear what you mean by ‘traditional’ methods. Do you mean, for example, close reading, or more established digital humanities methods?

Added language in para. 5 to clarify

Paragraph 10/11: 'Visualization' doesn't feel like the correct word to me - the visual representation of the two-dimensional space has nothing to do with the math we can perform on it. Maybe ‘graph’ instead of ‘visualizations’?

switched to "graph" in para 10/11

Paragraph 20 - perhaps this could be moved further down and incorporated within the code of the lesson - I don’t think it’ll be helpful until the reader actually does this part of the process.

Moved further down as suggested

Paragraph 22: this explanation would really benefit from a simple diagram showing some vectors as lines and the corresponding triangle and how the cosine distance in calculated?

We decided against including a diagram just so that the readers don't get so hung up on the math part of the lesson, but can discuss more based on reviewer feedback!

Paragraph 36: If this feature is IDE-dependent, it may be worth removing the part about tab completion or specifying when this feature will be relevant. I would bear in mind throughout that readers may complete the lesson using an IDE other than Jupyter Notebooks.

Added language in para 36 and removed tab completion bits

Paragraph 41: I think for a lesson at this level, it wouldbe useful to suggest how a reader might learn how to do this extra step of dealing with contractions.

Added language to para. 41 to explain the contractions further

Paragraph 52: could you expand on what different data types are best suited to the CBOW or skipgram methods?

added language to para. 52 to address this feedback

In the code block after paragraph 65, perhaps replace ("../../WordVectors/python/“) with 'FILL IN YOUR FILE PATH HERE’ as in the earlier code block, or explain more clearly that this may need to be changed.

replaced as advised

I found the validation section (paragraph 62 - 64) a bit confusing. It wasn’t clear to me what exactly the evaluation was doing, and how I should interpret the outputs. I was also a bit confused as to why the evaluation output was being saved as a .csv - considering in the rest of the lesson code outputs are interpreted underneath their code cells. It seemed as if this file was being saved for some particular purpose, but I wasn’t sure what that was.

added more context to this section and also provided an option for displaying the results in-line if wanted

I also had a small bug in this code block: running the original code I got an error saying ‘key ‘cupcake’ not present’. When I removed the cupcake entry from the test_words list, it ran fine, so I’m guessing it’s because cupcake is not in the vocabulary?

fixed this bug

It felt a little strange to end the lesson with a section headed ‘corpus preparation’. I suggest adding some final concluding remarks, summing up what the reader has learned, etc, before the further reading/next steps.

**We moved the preparing your own corpus to the end to help ease the lesson to a close

All of the below typos should be fixed now!

Small errors/typos:

Paragraph 3: remove the or a in the first sentence.

Paragraph 7: unnecessary hyphen after the em dash.

Paragraph 9: not sure ‘data’ is the appropriate heading here, as the paragraph focuses on the algorithm and some information about the lesson. Perhaps just remove and keep as part of the introduction?

Paragraph 14: use either dense or condensed

Paragraph 19: be consistent with capitalisation of Word2Vec, sometimes the first word is capitalised mid-sentence, sometimes not.

Paragraph 29: standardise use of 19th century and nineteenth-century (used differently in the header and body)

Paragraph 32: repetition of the information provided in the previous paragraph (the source of the recipes used for the lesson)

Paragraph 40: the references for these two citations should be provided, for example at the end of the lesson.

Paragraph 42: the last clause of the first sentence is missing a conjunction word/phrase, such as ‘and’ or ‘and finally’.

yann-ryan commented 1 year ago

Just to confirm, the second reviewer will be @anneheyer.

rubenros1795 commented 1 year ago

Thank you for this lesson. It is a clearly written and intuitive introduction to word embeddings. I liked the setup, especially explanation of vectors based on document-term matrices. The choice to first introduce word2vec, then run some code, and add considerations at the end is also a good one.

Here are my comments:

yann-ryan commented 1 year ago

Hi @rubenros1795, thank you so much for this! As per the guidelines, I'd ask the authors to hold off making any edits until the second review is posted. At that point, I'll summarise the reviews and provide a list of action points.

anneheyer commented 1 year ago

My apologies for this review coming in a bit late. First of all, let me emphasize how much I enjoyed this lesson. The lesson is very useful, has nice (and entertaining) examples, is clearly structured and generally very well written. In addition, I have also worked in the accompanying notebook that was equally really enjoyable, informative and inspiring for future work. I have a few general comments that will be followed by paragraph-specific suggestions for improvements and I also list a few typos that I found.

General comments Perhaps it is a good idea to explain a little bit more the relationship between the website text and the notebook? If I see this correctly, the notebook follows a slightly different structure, the main difference being perhaps “The Code” sections. For the user, it might be helpful if you mention in the website text that the code is further explained in the notebook under the sections “The Code”. This might be an additional reason for the user to open the notebook and explore further, even if they are beginners and find programming a bit scary. BTW: the notebook is really well done!

Specific Comments ¶4 perhaps name an example of IDE to help the user to understand? I can imagine less-experienced users being a bit confused with this. Obviously, you don’t want to explain details, but perhaps a little hint, example or link to an explanation would help?

¶5 I noted that you use “humanistic” rather than “historical”. Perhaps it would be good to shortly reflect (behind the scenes, not in the text) whether this lesson is targeting historians or humanists more generally. I am fine with the latter, but just wanted to make you aware of.

¶13 Perhaps a graph of a table (and a vector) would help the user to understand document-text matrix better here? I made a drawing to get my head around this paragraph when reading it on a screen.

¶13 and 14 One might get a bit confused here about the general structure of the argument: is a matrix (a) one way of representing a corpus and sparse vector representation (b) another? And does this mean that embedding models (c) are a sub-form of sparse vector representation? I understand what you are trying to do (and generally everything is very well explained and easy to follow), but perhaps you could be a bit more explicit in writing about the relationship between a) and b); and b) and c) would help a beginner.

¶31 Perhaps for the notebook, it would be useful to explain how to download the data at github. When I return to coding after months of other work, I always need to search where to find the right place for downloads at github. Code between ¶32 and ¶33 I had to install gensim first before being able to import it. Might be useful to add to your code to make it more accessible? !python -m pip install -U gensim

¶34 and ¶35 Excellent point. When I started coding, this was one of the things that took me a while to understand. This will be very useful for beginners and the format also allows more advanced users to skip this quickly. Well done! Code after ¶37: Does the basic course of python include an explanation about loops? One of the things that I found confusing at the beginning was that “name” in this code could also be “x” or “variable” or something else. To learn read the code, this might be a helpful “#note that…” for the user.

¶40 You can probably safely assume that users know what tokenization is, but if you like, you could add a link here to where this is explained for computational linguistics (or the relevant programming historian course paragraph on the website) as a small service to the user.

Code after ¶42 Understand everything but the 2 lines after #remove punctuation for which I had to google a bit and still feel a bit confused. Could you explain these (compile, escape, sub '[%s]’, %)a bit more in the notebook in the “code section”?

¶40 Very helpful and written in such an accessible way!

¶51 Every parameters is clearly explained. Compliments also for indicating default setting and when parameters are optional. But “Workers” section could be a bit more detailed, even though it seems less relevant.

¶51 In the notebook this section is called Analysis, which I think would also be helpful for the website text. Adding Analysis as a header would also help streamline working on the notebook while reading your explanations here.

¶60- 64 Would it be helpful to explain how to interpret the numbers in the output? I think you said something about interpreting the cosine earlier, but I am not sure how I would even call the ciphers in the output – similarity scores or are these likelihood? I guess you say this earlier, but here (or in the notebook) it would be helpful, too.

¶Validation Very well done. One thing to consider is mentioning that you have now provided code for one model and in order to make this work, one would have to come up with a few models. When going through this part of the lesson, I was wondering whether I had created multiple models, without really realizing it. But seeing that my csv file remained empty, I assume that I would do that by changing parameters? Perhaps you could give one example of an additional model that is also meaningful for your example to make this section even more concrete?

¶73 Again perhaps a superfluous comment: I was wondering whether you could add “Spanish” before gato? In some place like the US, Spanish is a quasi second language, but this might be different for the Asian context or even (Eastern) Europe where more people speak French. For “con” you could add in brackets (“with” in Spanish). Makes the text a little less abstract for some group of users.

¶79 Love the suggestions for further practices. I would also welcome a list of references for the scholarly texts (or blogs) that you mention in the text or is this against the programming historian’s format? Typos

¶3 “you can use the a Jupyter notebook”

¶5 “Questions such as these are the type of humanistic inquiries that can be prove to be challenging to answer through traditional methods such as close reading”

¶ 6 “Unlike topic models, which rely on word usage to better understand documents, word embeddings are more concerned with how words across a whole a corpus are used.”

¶9 “While word embeddings have been implemented in many different ways using varying algorithms, for this purposes of this lesson,”

¶13 “because most of the vectors for each word”

Again very well done and, as you hopefully notice, all my comments are rather minor. Thanks so much for creating this lesson. This is truly a great service to the community!

yann-ryan commented 1 year ago

Thanks so much for your review @anneheyer! Now that we have both, we can make a plan for revisions. I'll aim to read through both and comment here by the end of the week.

yann-ryan commented 1 year ago

Thanks again for these incredibly helpful and constructive reviews!

@blaak-18, @quinnanya, and @saraheconnell, I think both reviewers are in agreement that this lesson is in a very good state in terms of structure and the overall content. In both cases comments are minor and are relating to specific tweaks or requests for clarification.

I suggest going through the specific reviewer comments and addressing them individually - but there's no need for any very large or organisational changes at this stage. I think in most cases the changes to be made are clear, but here are a few points:

At some point, the Jupyter notebook will need to be updated, but I suggest waiting until the text of the lesson is finalised to do that. There is also the possibility of hosting your notebook on the PH's Google colab - are you OK with this?

Generally, its advised to have a timeframe of about a month for completing this round of edits. Does Friday October 7th work as a deadline for you?

saraheconnell commented 11 months ago

Yes, thank you, we really appreciate the time and care that went into these! We are working on the edits now and will put in every effort to have them ready by October 7. Avery and I are fine with hosting on Colab, as long as that’s also okay with @quinnanyahttps://github.com/quinnanya.

From: Yann Ryan @.> Date: Thursday, September 7, 2023 at 3:44 AM To: programminghistorian/ph-submissions @.> Cc: Connell, Sarah @.>, Mention @.> Subject: Re: [programminghistorian/ph-submissions] Understanding and Creating Word Embeddings (Issue #555)

Thanks again for these incredibly helpful and constructive reviews!

@blaak-18https://github.com/blaak-18, @quinnanyahttps://github.com/quinnanya, and @saraheconnellhttps://github.com/saraheconnell, I think both reviewers are in agreement that this lesson is in a very good state in terms of structure and the overall content. In both cases comments are minor and are relating to specific tweaks or requests for clarification.

I suggest going through the specific reviewer comments and addressing them individually - but there's no need for any very large or organisational changes at this stage. I think in most cases the changes to be made are clear, but here are a few points:

At some point, the Jupyter notebook will need to be updated, but I suggest waiting until the text of the lesson is finalised to do that. There is also the possibility of hosting your notebook on the PH's Google colab - are you OK with this?

Generally, its advised to have a timeframe of about a month for completing this round of edits. Does Friday October 7th work as a deadline for you?

— Reply to this email directly, view it on GitHubhttps://github.com/programminghistorian/ph-submissions/issues/555#issuecomment-1709639504, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB6KXECVEYCTCBJ5JDGT3ADXZF3L3ANCNFSM6AAAAAAWUQOEV4. You are receiving this because you were mentioned.Message ID: @.***>

yann-ryan commented 10 months ago

Hi @blaak-18, @saraheconnell, and @quinnanya: I've just uploaded your edited version of the markdown file directly to the repository. When you can, would you mind creating a comment here listing the changes you've made in response to the reviews? Thanks so much!

anisa-hawes commented 10 months ago

Hello @yann-ryan. Thank you for adding the authors' updates!

Thank you also for already mentioning that we can host notebooks associated with lessons within our organisational Colab space. When the notebook is finalised, we'd be very happy to receive it for processing + upload.

We're moving towards a new approach for integrating notebooks to support sustainability, future translatability and usability. Ideally, we want our readers to be able to make the choice to work in Google Colab, run the code locally, or opt to work in a different cloud-based development environment.

If authors provide codebooks to accompany their lesson, we ask that:

Please let Alex @hawc2 or I know if you have any questions. Thank you.

blaak-18 commented 10 months ago

@yann-ryan Below are the list of changes we have made according to reviewer feedback:

Please let me know if you need anything else in terms of notes!

yann-ryan commented 10 months ago

@blaak-18 Thank you so much for this, this is plenty of detail. I'll now do one final review, and then this is ready to move to the next stage!

yann-ryan commented 10 months ago

Hi @blaak-18, @saraheconnell, and @quinnanya,

Thanks again for this updated version. I've made a few very small edits but think we can begin to progress to the next step and recommend that this be copy-edited before publication.

@rubenros1795 and @anneheyer, would you mind having a look over everything and letting us know here if you have any further comments or suggestions based on this new revised version?

I've sent an email with some further small tasks so we can complete the lesson metadata. Also, can I ask you to create a new version of the notebook according to the guidelines posted by @anisa-hawes above? You can post a link to it here, or email it to me directly if you prefer.

The guideslines are:

Thanks!

Yann.

rubenros1795 commented 10 months ago

Dear all,

Thank you for these revisions. I have little to add, because all my main points are addressed. I think speaking of "closeness" is a perfect way to avoid theoretical complexities, but also an intuitive concept for people to understand what word2vec measures.

Ruben

anneheyer commented 10 months ago

Dear all, thanks for this very thorough revision process - really impressive work. For me everything looks great. Just one question: when will this be posted for the general public? I might direct some students to this in the upcoming block.
Anne

yann-ryan commented 10 months ago

Thanks @anneheyer and @rubenros1795!

@anisa-hawes and @hawc2, this lesson is more or less ready to be passed to you, I think, besides a few last pieces of metadata.

hawc2 commented 10 months ago

Congrats everyone on getting this lesson revised! Thanks to our reviewers! I'm looking forward to reviewing the lesson as well, and I'm very excited for Programming Historian to publish an introductory lesson on this important subject.

Would be possible to make a more direct link between this lesson and the Clustering and Visualizing Documents using Word Embeddings lesson that we recently published? I see it cited in the Next Steps section, but it might be nice for the link between the two lessons to stand out more clearly, and maybe say a little more about how they are connected?

I don't see References yet, we're you planning to include a Reference list?

One other minor thought - especially since this is an introductory lesson, could we include more links for key terms and references to sources like Wikipedia? For example, when you reference IDEs or Gensim... More links to relevant info the better in my opinion.

saraheconnell commented 10 months ago

Thanks! We can work on those edits—are there particular connections between the other lesson that we should be trying to draw out, apart from the fact that it's something that people might be interested in after the intro? (And, would corresponding edits need to be made to that lesson to point people toward the more introductory one?) Also, unless I've missed something, I don't think have write access. Is there a preferred way that we should be making revisions, now that others are more actively working on the draft?

anisa-hawes commented 10 months ago

Hello @saraheconnell,

I've sent you an invitation to join us an Outside Collaborator. This means that you can make direct edits to your Markdown file /en/drafts/originals/understanding-creating-word-embeddings.md.

Let me know if you need any advice, or if you'd prefer to email Yann or I your collective edits. Thank you, Anisa

saraheconnell commented 10 months ago

Hi @anisa-hawes, many thanks! I'll get started on those links and such as soon as I can and will let you know if any questions arise. Cheers!

anisa-hawes commented 10 months ago

Hello @yann-ryan,

Thank you for sharing the revised .ipynb with me.

--

While doing this I noticed:

Thank you. Anisa

hawc2 commented 10 months ago

@saraheconnell I think it could just be a sentence or two, maybe near the end of the lesson, saying something like: "Now that you've learned how to build and analyze word embeddings, you can see the Clustering and Visualizing Documents with Word Embeddings lesson to learn more about what advanced methods of analysis are possible." I'd mostly just like to make sure that the connection between these lessons on the same topic and both published by ProgHist is more clearly delineated, in distinction from the other next steps and related resources

After this lesson is published, we can reach out to the author of the Clustering lesson and ask them to add something to their lesson which links it more directly to your lesson for prerequisite learning.

@anisa-hawes let's plan to do this over the coming months - I also wonder if there's any more formal way ProgHist can have a little featured text at the bottom of the lesson that make it clearer what other lessons are directly connected. In this case, I don't think we can call these two lessons a 'series,' but they are very relevant to each other.

saraheconnell commented 10 months ago

Thanks, all! I just committed a few changes to add more informational links, a short references section, and a more direct reference to the other tutorial (making that more visibly the first "next step"). I also deleted that outdated reference to what can be found in the notebook. Let me know if I need to adjust any of this!

anisa-hawes commented 10 months ago

Thank you for your further edits @saraheconnell.

Would you like to re-read this, @hawc2? (You can review Sarah's adjustments in rich-diff here https://github.com/programminghistorian/ph-submissions/commit/1b03a4471b2e5f3752f7b99c3d164f121ddc4f5d) If you're happy for this lesson is ready to move onwards into Phase 6, let me know – we can plan time to start copyediting next week.

Anisa

hawc2 commented 10 months ago

@saraheconnell thanks for all your edits thus far. I've done a complete read through and made a series of line edits to clarify certain sections and standardize some styling. That includes a slight reorg of some of the Introduction section. Before we send this on to copy-edits, I have a set of final, minor revisions I was hoping you could make to the lesson.

That’s it! I hope these are all quick, minor edits for you. I found this lesson incredibly informative, it’s such a great description of word embeddings and how to use them. I’m excited for it to be published, and to use it for research and teaching!

Once you make these last edits, the lesson can move on to copyedits. We'll aim to publish before end of the year.

hawc2 commented 10 months ago

One other thing to note, early on you talk a bit about how the reader can use the Jupyter Notebook. It is worth adding that ProgHist is making it available as a Colab notebook. I know that is a recent addition we made, but we should make sure it works correctly, and that its availability is mentioned in the lesson when you bring up how the reader can access the code.

quinnanya commented 10 months ago

I think the list of comments in the post above were edits I was supposed to take care of at an earlier point but got lost in the inbox. Let me see what I can do to edit it next week?

quinnanya commented 9 months ago

Thanks for these notes, @hawc2! I think I've addressed all of them, clarifying the text in several places, streamlining the code, and consolidating a few things.

The architecture of Word2Vec is a 2-layer neural net (see here for an explanation), but I don't know if even saying "neural net" is sort of opening a can of worms we then have to deal with, so I deleted it everywhere instead. Happy to take the other tack if you'd prefer, though!

hawc2 commented 9 months ago

Thanks @quinnanya for addressing my comments and making these revisions.

As for the '2-layer neural net' explainer, it's up to you if you want to explain it or not. I agree it's not really necessary to understanding how to interpret word embeddings themselves, and it does open a can of worms. It's also fine to briefly state the fact that it's a two-layer network and link to the explainer you provided, although it's possible that blog you linked won't always be available. There will still be some opportunities for minor tweaks during copy editing as well, but if you plan to add this link let us know now.

@anisa-hawes this lesson is ready for Phase 6!

quinnanya commented 9 months ago

Let's just leave the neural net piece out of it, it feels like it's likely to cause more problems than the value it adds

anisa-hawes commented 9 months ago

Super! Thanks to all.

--

Hello @blaak-18, @quinnanya, and @saraheconnell,

Your lesson will now be copyedited by our Publishing Assistant, Charlotte (@charlottejmc). We aim to complete the work by ~next Friday 8th December.

Please note that you won't have direct access to make further edits to your files during this Phase.

Any further revisions can be discussed with your editor @yann-ryan after copyedits are complete. Thank you for your understanding.

Anisa.

charlottejmc commented 9 months ago

Hello @blaak-18, @quinnanya, @saraheconnell and @yann-ryan, I've prepared a PR with the copyedits for your review.

There, you'll be able to review the 'rich-diff' to see my edits in detail. You'll also find brief instructions for how to reply to any questions or comments which came up during the copyedit.

When you're both happy, we can merge in the PR.

blaak-18 commented 9 months ago

@quinnanya @yann-ryan @charlottejmc @saraheconnell

Thank you so much for these edits! Sarah and I have reviewed the copyedits and everything looks great to us. We're happy for this to go ahead and be merged!

charlottejmc commented 9 months ago

@blaak-18, thank you for the confirmation!

I have now merged the copyedit branch and will move on to the next phase, which is the Typesetting. ✨

charlottejmc commented 8 months ago

Hello @blaak-18, @quinnanya, and @saraheconnell,

Thank you again for reviewing my copyedits, which you saw in the pull request I made previously (now closed). There are, however, a few outstanding points from the comments I made inline, which I thought might be easier to reiterate here:

I apologise if these questions were not immediately visible to you from my comment above.

Thank you for your help! Charlotte

charlottejmc commented 8 months ago

Hello @hawc2,

This lesson's sustainability + accessibility checks are in progress.

Publisher's sustainability + accessibility actions:

Authorial / editorial input to YAML:

The image must be:

- name: Forename Surname
  orcid: 0000-0000-0000-0000
  team: false
  bio:
    en: |
      Forename Surname is an Assistant Professor in the Department of Subject at the University of City.
- name: Forename Surname
  orcid: 0000-0000-0000-0000
  team: false
  bio:
    en: |
      Forename Surname is an Assistant Professor in the Department of Subject at the University of City.
- name: Quinn Dombrowski
  team: false
  orcid: 0000-0001-5802-6623
  bio:
    en: |
      Quinn Dombrowski is the Academic Technology Specialist for the Division of Literatures, Cultures, and Languages at Stanford University, and works on non-English digital humanities.
    fr: |
      Quinn Dombrowski est spécialiste des technologies appliquées à la recherche au sein de la Faculté de Littératures, cultures et langues à l'Université Stanford avec un intérêt particulier pour les humanités numériques non-anglophones.
    pt: |
      Quinn Dombrowski é técnica especialista na Divisão de Literaturas, Culturas e Línguas da Stanford University e trabalha em Humanidades Digitais não anglófonas.

Files to prepare for transfer to Jekyll:

Promotion:

saraheconnell commented 8 months ago

Hi Charlotte, apologies for missing these! Avery and I should be able to tackle them next week—hopefully by end of day Wednesday.

Cheers,

Sarah

quinnanya commented 8 months ago

Hi Sarah,

Do you want me to try to do a first pass on this stuff tonight? I’ve got a bit of time.

~Quinn

charlottejmc commented 8 months ago

Hello @blaak-18, @quinnanya, and @saraheconnell,

I have had a first look around for a lesson avatar, and suggest to you this drawing from the British Library.

11129133475_5a895af4c0_q

What do you think? I like the feeling of 'embeddedness' it creates, but we can also look for images that suggest networks, vectors, milk, cookbooks, etc.!

saraheconnell commented 8 months ago

Hi @charlottejmc,

Thanks for checking in! We are very charmed by that image—another possibility is this one (https://www.flickr.com/photos/britishlibrary/11139139683/in/album-72157638850077096/), which I’d sent to Yann earlier. We are happy with whichever you think is the best fit; the star is perhaps more evocative of vectors, but the goblin is more memorable.

We also just got off a call together and believe we have resolved the remaining comments—please let us know if we missed anything! (The commit says ‘edits in progress’ but we didn’t have any further changes to make after we checked that in.)

One thing we wanted to flag is that the markdown in the info box at line 167 might need to be reviewed by the typesetter.

Here are the other remaining details: Difficulty: Intermediate

Research activity: Analyzing

Topics: python, distant-reading, machine-learning

Abstract: Word embeddings are a text analysis method that allows you to analyze the usage of different terms in a corpus, by capturing information about their contextual usage. This lesson covers how to create word embeddings and use them to answer humanities research questions.

The topics and abstract are new, but we’d sent the research activity and difficulty to Yann last month—just wanted to flag that in case sending it again would cause any confusion! For the bios, I sent mine in to Yann earlier, and Quinn already has one on file. Avery will get hers in soon.

Thanks so much,

Sarah

blaak-18 commented 8 months ago

And here is my bio!

Avery Blankenship is a PhD candidate in the department of English at Northeastern University. She is a member of the Viral Texts Project and her work is published in The Mark Twain Annual, _The Nathaniel Hawthorne Review__, and in the Viral Texts book project Going the Rounds: Virality in Nineteenth-Century American Newspapers. Her dissertation, Marginal Spaces explores the uptake, use, and transformation of nineteenth-century American cookbooks and recipes along the lines of race, gender, and class. Her research focuses on nineteenth-century domesticity, cookbook and recipe circulation, and power dynamics in the nineteenth-century home.

charlottejmc commented 8 months ago

Hello all,

Thank you very much for your response! I've now added all the information into the markdown file, and only made a few minor touch-ups to your additions, which were great.

I decided to go with the 'goblin' avatar in the end, as I agree with you it's more memorable!

Thank you @blaak-18 for your bio – we use a specific format which looks like this:

- name: Forename Surname
  orcid: 0000-0000-0000-0000
  team: false
  bio:
    en: |
      Forename Surname is an Assistant Professor in the Department of Subject at the University of City.

So, may I suggest:

- name: Avery Blankenship
  orcid: 0000-0000-0000-0000
  team: false
  bio:
    en: |
      Avery Blankenship is a PhD candidate in the department of English at Northeastern University.

Unfortunately this means we can only use the first sentence of the bio you wrote. Apologies! I didn't find an ORCID number for you, but do let me know if I'm wrong.

The lesson is getting really close to publication now. Exciting! We are grateful for your patience and your work so far.

anisa-hawes commented 8 months ago

Thanks @saraheconnell, @blaak-18, and @quinnanya for the energy you have given to these revisions, clarifications and adjustments.

We noticed that the corpus central to this lesson is hosted by the Viral Texts Project on your GitHub repository Nineteenth-Century American Recipes.

We'd like to host a copy of this dataset ourselves so that we can ensure the lesson's sustainability.

blaak-18 commented 8 months ago

@anisa-hawes I was the one who collected the data for that set so I'm totally fine with the set being hosted through PH to make things easier

anisa-hawes commented 8 months ago

Thank you, @blaak-18. I noticed that you don't explicitly acknowledge this in the lesson, so let's also add an endnote with a clear citation for the dataset:

Blankenship, Avery. “A Dataset of Nineteenth-Century American Recipes,” Viral Texts: Mapping Networks of Reprinting in 19th-Century Newspapers and Magazines. 2021. https://github.com/ViralTexts/nineteenth-century-recipes/.

blaak-18 commented 8 months ago

@anisa-hawes That works for me! I appreciate your work on this!

charlottejmc commented 8 months ago

Hello @hawc2,

This lesson's sustainability + accessibility checks are now complete.

Avery Blankenship:

- name: Avery Blankenship
  team: false
  bio:
    en: |
      Avery Blankenship is a PhD candidate in the department of English at Northeastern University.

Quinn Dombrowski:

[Already in ph_authors.yml]

Sarah Connell:

[Sent to @yann-ryan]

Promotion: