Review Ticket: Beginner's Guide to Twitter Data

acrymble commented 5 years ago

The Programming Historian has received the following tutorial on 'Beginner's Guide to Twitter Data' by @BCWrit. This lesson is now under review and can be read at:

http://programminghistorian.github.io/ph-submissions/lessons/beginners-guide-to-twitter-data

Please feel free to use the line numbers provided on the preview if that helps with anchoring your comments, although you can structure your review as you see fit.

@spapastamkou will act as editor. Her role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum.

Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.

I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me. You can always turn to @amandavisconti if you feel there's a need for an ombudsperson to step in.

Anti-Harassment Policy

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

The Programming Historian is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutinize ideas, to ask questions, make suggestions, or to requests for clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. If anyone witnesses or feels they have been the victim of the above described activity, please contact our Ombudsperson (@amandavisconti). Thank you for helping us to create a safe space.

acrymble commented 5 years ago

@BCWrit a quick update that @spapastamkou will be the editor for this submission. She will be reading the lesson and providing some feedback for you to respond to, and then she'll solicit formal reviews. I'll leave it to @spapastamkou to take over from here.

drjwbaker commented 5 years ago

@BCWrit And so you know, I'll be casting and eye over the process and commenting occasionally. But @spapastamkou is very much the lead editor.

spapastamkou commented 5 years ago

@BCWrit Thank you for the lesson, I started reading it and will send feedback in just a couple of days. We'll work closely with @drjwbaker as this is the first lesson I edit!

spapastamkou commented 5 years ago

This lesson is intended to beginners. Its main proposed learning outcome is to make them familiar with acquiring, hydrating, and cleaning Twitter data. To do so, it proposes use of the TweetSets application, that has been developped for academic purposes. The datasets can be used for spatial and social network analyses, which are however beyond the scope of the lesson.

Here are some comments before launching the review process. They aim mainly at making the text more friendly for beginners, because some notions could perhaps not be clear at this level.

In a general way, it could be useful to clarify terms such as: hydrate (tweets), dehydrated (tweets), Twitter API key and why one needs to have one, nodes and edges (perhaps add some ressource like this: https://en.wikipedia.org/wiki/Glossary_of_graph_theory_terms). Whether you would prefer to add links or just some short phrase, I think it could be useful.

Here follow some comments on specific points:

l. 27: "the TweetSets website states they are also open to queries regarding the construction of new datasets": add a link to this part of the web site?

l. 39 "doi": insert link to wikipedia notice for doi could be useful

l. 81 "DocNow": insert a link or add a small phrase to briefly state what it is? (I mean the platform, to know where the hydrator comes from)

l. 97 ".csv" file extension: insert link to wikipedia for csv could be useful

l. 114 "All of these processes will probably include some light data work to format this dataset into one legible for your platform of choice (R, Python, Arc/QGIS, Gephi/Cytoscape/Palladio are some popular ones)" => First, I suggest that you emphasize more on the method, rather than on the platforms. Second, when enumerating the tools (programming languages and software), that you specify the type of analysis they allow (SNA, mapping). And third, distinguish between programming languages (Python and R) that allow all types of analyses mentionned, and software that are specialized in one of these analyses (as you already grouped them).

l. 118 Tableau: add link could be useful l. 130: Is it possible to mention some open source spreadsheet alternative, next to Excel etc when speak of VLOOKUP?

Finally, I believe it would be useful to have a brief conclusion in the end to sum up the main learning outcomes and the main points one beginner should retain from this lesson.

And a question: in line 30, "1" stands for something or is it forgotten?

Thank you for proposing the lesson!

@drjwbaker, you are welcome to complete my remarks either now or at a later stage.

drjwbaker commented 5 years ago

I only have two comments:

l.20: don't assume readers know what George Tech is. Add a link here.
It would be useful to include a line on why creating a TweetSet is a valuable exercise. Given that this lesson is aimed at beginners, you might want to cover some basics relevant to historians such as: it creates a stable, archivable dataset; note any limitations to historic tweet access that Twitter imposes; it is less prone to filter bubble problems than just using the Twitter search feature to find relevant material.

acrymble commented 5 years ago

Just a note to say that the author emailed me and was unaware of this ticket. I have sent him the link

BCWrit commented 5 years ago

Yes, sorry: my notifications were being sent to an email I don't use much anymore. I will get to work and get back to you this week!

spapastamkou commented 5 years ago

Thank you, we will then be able to launch the review process afterwards.

BCWrit commented 5 years ago

I'm just finishing up the conclusion suggested by Sofia, but wanted to just mention a couple things I wasn't able to implement and see if there's suggestions before I resubmit.

Sofia l.27: the message seems to be gone currently. It was on the main page, and has been replaced by a message that says they have specific new datasets that they will add if people request them. I guess my take on this would be a link wouldn't really help since it's at the top of the main page, but the idea that they are open to adding new sets and have this be dictated by users is still true.

l.118. I believe there was a link to the tableau site already, so I'm not sure if that isn't coming through, or if you wanted it to be replaced with an informational site rather than the company one?

l.30. I wasn't able to locate the "1" you are referencing. Is this in the author lines? If so, I believe it's meant to denote the author order, though this may just be an artifact of the template.

James: I added a bit of this language to the lesson, but was not clear about the last point: "A TweetSet is less prone to filter bubble problems than the Twitter search feature." Could you explain so I can add this insight into the document?

spapastamkou commented 5 years ago

Thank you, @BCWrit, this is fine with me for the first two points; for the third, the "1" I refer to is in line 30 of the edited text in this link: https://github.com/programminghistorian/ph-submissions/edit/gh-pages/lessons/beginners-guide-to-twitter-data.md But this remains a detail of formatting that I can certainly treat when finalising the file, I just wanted to be sure what it was about.

BCWrit commented 5 years ago

Yes, that seems to be some kind of formatting artifact. I don't see anything that would produce that in my local .md file, so maybe we'll see when I reupload the edited draft.

BCWrit commented 5 years ago

@spapastamkou @drjwbaker Okay, I've made all the changes and am reuploading the .md file. Did not hear back from James yet and thus did not integrate his second point, but am happy to once I get some clarification. Just wanted to make sure I got this off before ACH; also happy to discuss there if it's easier!

drjwbaker commented 5 years ago

@BCWrit Sorry. Been on leave. What I meant is that as a researcher if you download a selection of tweets you can use your local search tools to analyse them. But if you just search Twitter using the online interface, you have no control over the search, and Twitter will present you with tweets based on what it thinks you will want (based on who you follow, what you've previously tweeted, et cetera). Clifford Lynch discusses the archiving in the context of the live web personalising our experience in ‘Stewardship in the “Age of Algorithms”’. First Monday 22, no. 12 (2 December 2017). http://firstmonday.org/ojs/index.php/fm/article/view/8097.

BCWrit commented 5 years ago

@drjwbaker no worries, and thanks for the clarification. I've added language to reflect this and reuploaded the document!

spapastamkou commented 5 years ago

Thank you @BCWrit. I will be back soon with news for the review process.

spapastamkou commented 5 years ago

As editor of this lesson, I solicited reviews from @edsu @telmomenezes and @inactinique. @drjwbaker and I are co-editors and we will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email us or @acrymble. You can always turn to @amandavisconti if you feel there's a need for an ombudsperson to step in. The ideal date to have the reviews in would be the 1st of September. Thank you.

inactinique commented 5 years ago

Dear all,

Please, find my review below.

It is a good, well structured and interesting lesson. One of the final sentences states: "While this tutorial focuses on simplified methods, it should serve as a kind of “on-ramp” to show beginners that digital research–collecting thousands of data points, preparing those data points, and doing some sort of analysis/visualizaiton of them–is within reach." The lesson indeed reaches its goal of being an on-ramp to show to beginers that learning digital methods is possible to beginners.

I have nevertheless a few remarks. @drjwbaker and @spapastamkou will see if they are all useful and in line with the PH's editorial policy.

General but not that important remarks

please, give a date to all your illustrations, as snapshots of websites and/or software may not correspond to the actual interface students will see.
I got a 406 error trying to download a dataset. Can it be a problem?
Some illustrations are appearing as links and not images: http://programminghistorian.github.io/ph-submissions/images/beginners-guide-to-twitter-data/tableau-map.png ; http://programminghistorian.github.io/ph-submissions/images/beginners-guide-to-twitter-data/decrease-decimal.png ; http://programminghistorian.github.io/ph-submissions/images/beginners-guide-to-twitter-data/vlookup-search.png ; http://programminghistorian.github.io/ph-submissions/images/beginners-guide-to-twitter-data/last-values.png ; http://programminghistorian.github.io/ph-submissions/images/beginners-guide-to-twitter-data/first-return-value.png ; http://programminghistorian.github.io/ph-submissions/lessons/beginners-guide-to-twitter-data

Paragraphs by paragraphs (and more important) remarks

(sorry for my very direct and possibly not very polite French-style review).

¶1 - Would be great to develop a bit why a historian would like to collect tweets, for instance by giving an example, with a link to an actual published article that is using tweets and is written by a historian (as single author or not). I was thinking about some articles evoking the Bataclan attacks, as it became quite paradigmatic of what you can do with a corpus of tweets but DocNow gives other references that are good as well and there are, as you know, many other examples. (Digital) Memory studies could be mentioned explicitly too (disclaimer: it's my current field :)).

¶3 - Would it be possible to mention alternative tools and, more important, other ways (than downloading sets of dehydrated tweets) to get Twitter data? For instance, DMI-TCAT (as, once installed on a server, it is quite easy to use, to remain in the beginner's guide spirit). I am saying that because the topics that you can study with tweetsets's corpus are quite narrow. Even if you go to Stanford's SNAP, it's a bit limited. I fear that this limitation may also limit the audience of the lesson.

¶3 - "yoru" > "your".

¶4 - If it's for beginers, it would be great to explain in a short sentence what is an API, to link it to a wikipedia article or to link it to another PH lesson (https://programminghistorian.org/en/lessons/introduction-to-populating-a-website-with-api-data#what-is-application-programming-interface-api).

¶8 - Mentioning that cliking on the DOI link will also enable the student to understand how the dataset was collected would be usefull (Was it a hashtag-based harvesting? In that case, which hashtags were collected? are quite important questions, for instance - see https://firstmonday.org/ojs/index.php/fm/article/view/6353).

¶14 - Please, mention here or at ¶42 that getting only geotagged tweets introduces biases (see: https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10662 ) and that the definition of geottaged data in Twitter metadata seemed to have changed for a couple of years (https://www.martasevero.com/geotagged-tweets-yesterday-today-and-tomorrow/).

¶46 - This paragraph should state more clearly, more explicitely and more boldly, that using excel (or calc, or...) is really usefull for small-scale datasets only and, even for those relatively small datasets, can be a problem if the computer has not enough RAM. Mentioning OpenRefine could be a good idea for bigger (but not that big) datasets (with a link to https://programminghistorian.org/en/lessons/fetch-and-parse-data-with-openrefine).

Hope this review was useful. Best, Frédéric

BCWrit commented 5 years ago

@inactinique, Thanks so much for your thoughtful and helpful feedback. PS: I did not find anything too direct or impolite!

@drjwbaker @spapastamkou Should I go about making these changes now, or is there some sort of editorial process that takes place first?

spapastamkou commented 5 years ago

Hi all, sorry for the late response, I just came back from vacation. Thank you for your review, @inactinique. @BCWrit, once all reviews are in, I and @drjwbaker will make a summary of the principal points to address so that you can make all changes you think necessary at once. Thank you!

edsu commented 5 years ago

I think this is an easy to understand guide for working with Twitter data. I think it will be especially useful for people that want to do Twitter data analysis in GUI applications. I especially like how the guide draws on the author's experience teaching an undergraduate course, and how it links out to other relevant Programming Historian pieces.

I think that it is worthwhile adding a few sentences near the beginning about why researchers are distributing tweet identifier datasets rather than the raw data as either JSON or CSV. I also think it would be useful to talk about what data does not get hydrated, namely any tweets that were deleted or protected. What is the significance of this for repeatability in research?

I did wonder why the edges.csv file was used when it would have been possible to generate the edges directly from the hydrated data. Perhaps I missed it because of the missing screenshots, but I didn't see when the hydrated tweets CSV was opened with the spreadsheet application.

Here are some notes about particular lines:

¶3 "yoru" instead of "your"

¶4 You might want to link to the Documenting the Now Catalog that lists a variety of publicly available tweet ID datasets: https://www.docnow.io/catalog/ Maybe this would be a good place to briefly discuss why tweet identifiers datasets are being distributed rather than the underlying data (Twitter's Developer Terms of Service, and ethics around sharing social media data).

¶33 Hydration creates a JSON file. It is only after the JSON has been retrieved that you can generate a CSV from it. The instructions are missing this step and seem to imply that you can hydrate directly as CSV without the step of creating the JSON file.

¶39 I think it could be useful to mention that hydration can be done from the command line using the twarc tool from the Documenting the Now project. This can be useful for hydrating very large tweet id datasets since Twitter rate limit the hydration. Linking to a relevant Programming Historian article about the command line might be useful as well.

¶40 Is there really sentiment analysis information available in the downloaded data?

¶46 SNA should be spelled out the first time: Social Network Analysis (SNA)

¶50, ¶56, ¶64, ¶66, ¶70 It looks like images arenn't rendering properly?

spapastamkou commented 5 years ago

Thank you for the review, @edsu. I suggest we wait a bit more until the review of @telmomenezes is in and then go for the final revisions with @BCWrit before publishing the lesson.

telmomenezes commented 5 years ago

Dear all,

My sincere apologies for the delay, this summer ended up being more busy / complicated than I expected...

Overall I think this is a very nice guide. My main criticisms have already been covered by others here, so I will seek to emphasize what I think is more important, at the risk of being a bit redundant. My main overall complaint is that the guide lacks a bit of "depth", when it comes to explaining not only how something is done but also why it is done. For example, the article mentions "hydrating" data right in the beginning. This is a nice metaphor, but it is also jargon from a very specific community. It would be better to explain the general idea of enriching data, and then mentioning "hydrating" when diving into the details.

I have to be honest, I don't like the term "metadata". It is a very commonly used term so I will not insist, but I feel that the distinction between "data" and "metadata" is not rigorous enough to be useful in an academic environment. Metadata is data about data. A classical example is the set of books in a library (data) and the card catalog of these books (metadata). In the modern Internet, the distinction is never this clear. I would say that metadata classically refers to things that could be derived from the data. Any index is metadata, and an index can be rebuilt using only the original corpus of data. This is the point where it becomes confusing here, because a naive user might have the impression that "hydrating" tweets is a similar concept, but in fact it always depends on Twitter providing us with the data. They could very well decide not to, and then there is not way to independently generate e.g. tweet text from tweet ids.

I think it would be good to give the beginner a good understanding of the situation: what data lives in Twitter servers, the difference between retrieving new data (hydrating) and performing any sort of analysis on one's own computer, what is an API (no big details needed), etc.

Regarding this line in the overview: "More simply, we will be acquiring unique IDs from Twitter, linking those IDs to large amounts of metadata, processing that data to make it usable, and then using it to generate insights and analyses. " I don't feel that the text delivers on the last promise. It doesn't really describe how to generate insights and analyses, only how to export data into tools that could fulfill these promises. I think the scope of the guide is great, but some readers might be frustrated if expecting something that ends up not being covered. Perhaps better to rephrase the above sentence.

¶42 link to Tableau is incorrect

spapastamkou commented 5 years ago

Thank you very much for the review, @telmomenezes !

@BCWrit, please give me one day or two max and I will come back with a summary of the three reviews and the main points to address together, if you agree.

BCWrit commented 5 years ago

@spapastamkou Sounds great; looking forward to it!

spapastamkou commented 5 years ago

Dear @BCWrit, you will find below a summary of the main points to adress according to the three reviews we received. Sorry for the delay, and thank you for your patience.

Principal suggestions for modifications

Typos

¶3 - "yoru" > "your"

Additions and or modifications

[x] Add a note (at the end of the lesson in italics) that precises the date the images files were produced, perhaps smth like "All image files were produced [a date], by consequence views of the interfaces correspond to this date" (as per @inactinique proposal). We could retain the date the deposit was made on the repository. If you agree, I can make the addition when I prepare the lesson for publishing.
[x] Overview part: I propose (considering points raised in reviews) to reorganize clearly around three points: 1) What 2) why 3) how (up to you to define the order).
Point 1: You could start with a general phrase such as: "Twitter data are widely used for research purposes and are collected through a variety of methods and tools." Then keep the first two phrases of the current text ("In this guide, we’ll show you easy methods for acquiring, hydrating, and cleaning Twitter data, with some gestures toward specific types of spatial and social analyses. More simply, we will be acquiring unique IDs from Twitter, linking those IDs to large amounts of metadata, processing that data to make it usable, and then using it to generate insights and analyses".) With three suggestions for modifications:
replace "hydrating" with smth like "retrieve information on" or "getting complete details for" (as here: https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-lookup.html) and place "hydrating" in parenthesis right after.
replace "linking those IDs to large amounts of metadata" with smth like: "linking those IDs to detailed data about the tweet texts and the users" Then add a small part to explain why IDs datasets rather than raw data of tweets (to meet the need to say a word about Twitter's Developer Terms of Service, ethics). And make here a clear distinction between hydrated and dehydrated data.
replace this part: "...and then using it to generate insights and analyses." => (stop previous phrase after "make it usable.")" Then we can generate insights from the data and analyze them with various methods and tools, such as textual and social network analysis. "

Point 2: (Your actual 3rd phrase) "This process might be attractive to historians, political scientists, public policy experts (and even literary scholars like myself) who want to examine the Twitter discourse surrounding historical events, and gain insight into the geographic, chronological, and social elements of twenty-first-century-politics."

I would propose that you add sociologists as well (or delete all and just speak of humanists and social scientists?)
after "Twitter discourse" add "and human interaction" (as current researches focus either on tweet content or users interactions or both)

Point 3: What you propose in the current 2nd paragraph of Overview.

[x] ¶4 (I take your initial phrase and put proposed additions in bold)

Other common places to acquire dehydrated datasets include Stanford’s SNAP collections, DocNow project and data repositories, or going through the Twitter API, or Application Programming Interface (if you wonder what this is, please check this lesson, directly.

[x] ¶8
(I take your initial phrase and put proposed additions in bold)

Clicking the name of each set will give you more information on it, including its DOI, or Digital Object Identifier, which allows you to reliably locate a digital object (and learn more about how it was created).

[x] ¶33 : Please check @edsu comment to make sure all steps are clear in the text ("Hydration creates a JSON file. It is only after the JSON has been retrieved that you can generate a CSV from it. The instructions are missing this step and seem to imply that you can hydrate directly as CSV without the step of creating the JSON file.").
[x] ¶46 In standard SNA parlance, => replace with smth like "For those familiar with social network analysis (SNA)" or something that allows to spell out the acronym (perhaps it is the case with your initial phrase already but my English does not allow me to be sure, sorry)
[ ] Image links issues: @spapastamkou in charge

To discuss

@BCWrit and @drjwbaker There have been suggestions to evoke scientific publications with use of Twitter data; and other tools that gather these data. Below some ideas to consider

[ ] Link to works that make use of Twitter data either with a phrase in the text + link to a selected work OR perhaps a selected bibliography of relevant publications as notes to the lesson. The 1st option seems to me a bit reducing bcz there are many works (but maybe you do not feel this way); for the 2nd, I wonder whether it would not go beyond the purpose of the lesson? Knowing that I can easily provide a selected list that I used for a recent publication.
[ ] Other methods and tools to gather/hydrate Twitter data: The same considerations as above: a list can perhaps be provided with a note at the end of the lesson but I leave it up to @BCWrit to decide. Here is a small list just in case:
R packages TwittR https://www.rdocumentation.org/packages/twitteR/versions/1.1.9 rtweet https://rtweet.info/
Python Tweepy library https://www.tweepy.org/ twarc (to hydrate data)
NodeXL (add-in for excel) https://archive.codeplex.com/?p=nodexl
Digital Methods Initiative Twitter Capture and Analysis Toolset dmi-tchat https://github.com/digitalmethodsinitiative/dmi-tcat

Further info to provide

@BCWrit We need bios of authors (position and work institution): could you please provide this info?

Please @BCWrit use the existing file in the repo (https://github.com/programminghistorian/ph-submissions/blob/gh-pages/lessons/beginners-guide-to-twitter-data.md) to make modifications without uploading a new one, as this could provoke conflicts with the existing version.

Thank you!

spapastamkou commented 5 years ago

@drjwbaker If you have further suggestions, please edit the comment above and add them. Thank you!

spapastamkou commented 5 years ago

@BCWrit We also ask authors to consent to following declaration by posting it in a separate comment. Could you please do this in this ticket? Thank you!

I the author|translator hereby grant a non-exclusive license to ProgHist Ltd to allow The Programming Historian English|en français|en español to publish the tutorial in this ticket (including abstract, tables, figures, data, and supplemental material) under a CC-BY license.

BCWrit commented 5 years ago

I the author|translator hereby grant a non-exclusive license to ProgHist Ltd to allow The Programming Historian English|en français|en español to publish the tutorial in this ticket (including abstract, tables, figures, data, and supplemental material) under a CC-BY license.

BCWrit commented 5 years ago

About halfway through the edits, but need to stop and it will probably be a day or two before I get back to it, so wanted to document adaptations I made to the suggestions.

The image date statement I changed to indicate images were captured before the upload date, as I had been collecting them prior to the actual submission.
I removed all references to "hydrating" in the intro section and instead used general language. I think I explain the idea pretty well in the 2nd paragraph of the "TweetSets" section, so I figured it's probably easiest to just avoid specialist terms until that point.

Everything else I think I followed pretty closely so far, aside from maybe a stylistic tweak or two.

BCWrit commented 5 years ago

¶33: I did not have this issue: every time I hydrate, it is in csv. Especially if you follow the step that says to append .csv to the output file, it should be in .csv format. There can be some format issues if this step is not followed, but in my experiences it reliably produces a .csv output. I have not made changes here, but am open to them if the issue is reproduced.

BCWrit commented 5 years ago

@drjwbaker I am happy to include the two lists you suggest (further reading and related packages). I'm happy with the latter list you provide, and would use the list you referenced for the former as I don't necessarily have any in mind.

@drjwbaker @spapastamkou I believe I have completed the suggested edits, with the caveats provided above, and also have not added the aforementioned lists as I'm not sure exactly how you'd want them formatted/organized. Happy to do it, or it might be easier if you dropped them in how you wanted them.

BCWrit commented 5 years ago

Finally, here are author bios:

Ximin Mi is a Data Visualization Specialist / Librarian at Georgia Tech. She holds a M.S from iSchool from University of Illinois at Urbana-Champaign, and a M.S of Education from Arizona State University. Ximin currently manages the Data Visualization Lab and services at Georgia Tech Library. Her job responsibilities include providing research, teaching and learning support on data visualization projects across campus, collaborating with faculty members on data visualization instruction, and managing the periodic technological update of the lab. Ximin is also part of the Library Next Portfolio management process. She has been leading projects on the design and execution of data visualization, Virtual Reality /Augmented Reality, and Media Scholarship Commons services.

Courtney Allen a second year MS-HCI student at Georgia Tech, in the School of Interactive Computing (College of Computing). She is a User Experience Reseacher, supporting design with data-driven choices and elements, and is a Graduate Teaching Assistant for HCI Research Methods.

edsu commented 5 years ago

¶33: I did not have this issue: every time I hydrate, it is in csv. Especially if you follow the step that says to append .csv to the output file, it should be in .csv format. There can be some format issues if this step is not followed, but in my experiences it reliably produces a .csv output. I have not made changes here, but am open to them if the issue is reproduced.

If you are using the Hydrator to hydrate the tweet ids it first hydrates them as JSON. After that is finished it allows you to convert it to CSV if you want. I should know, I created the tool ;-)

BCWrit commented 5 years ago

Okay, I get how it's working now. Added the suggested changes, and also some small updates to surrounding text and captions to reflect these changes.

spapastamkou commented 5 years ago

Thank you very much, @BCWrit. I will do final checks in the coming days and hope we can have the lesson published next week.

For the bios, we need yours as well:-)

For the image links, I checked and did not find any problems in the file. If the problem persists after I move the files and make the pull request for publication, then I'll turn for help to the PH's technical team.

BCWrit commented 5 years ago

@spapastamkou Ah yes, my bio:

Brad Rittenhouse is the Lab Coordinator of the Digital Integrative Liberal Arts Center at the Georgia Institute of Technology. In that role, he helps facilitate the integration of digital tools into research in a wide variety of humanities disciplines. His research is on the information logics of literary devices; he most recently published "TMI: The Information Management Aesthetics of Herman Melville and Walt Whitman" in ESQ.

spapastamkou commented 5 years ago

@BCWrit thank you!

spapastamkou commented 5 years ago

@BCWrit Sorry for this big delay. I made some modifications since last week and just finished the final review of the text. To sum up:

I added some links (for example for DocNow)
I added one phrase on restrictive Twitter policies and research ethics as reasons to use tweets IDs
deleted in the "Outputs" part "some basic sentiment analysis," because this is something that can be detected from further analysis but rather not from Twitter metadata
fixed typos and harmonized words likes IDs (so if "ids" then replaced by "IDs").
turned all links to PH lessons to relative links to have the text ready for publication.

Please take a look especially for the phrases I added to see if you are happy with the result.

And there are also two phrases where I think words are missing (I put in bold where I see the problem):

Second paragraph in "Conclusions" part: "The nodes and edges information we produced is ready to be used in a wide variety of social networking, but more importantly, the you can use the VLOOKUP method for a wide variety of data correlation tasks that might not warrant learning and using something like SQL. I use it all the time in my research to connect file names or unique IDs to more substantive metadata like author names, publication dates, or book titles."
Image title: "{% include figure.html filename="tableau-map.png" caption="A quick sketch of the \"place\" data in Tableau. The tweets are taken from the just a few days surrounding each of the storms. One could perhaps argue that these maps show discourse around these storms forming equally in unaffected metro areas as places that fell storms' paths." %}" Please let me know what the formulation shoud be here so that I can make the modifications.

Once you let me know, I can ask for the pull request to be raised.

Thank you, and again my apologies to all three authors for my delay.

drjwbaker commented 5 years ago

Just wanted to add that I've been hanging out in the background on this one as planned. Looks like we have a great lesson here, so thanks to @BCWrit for working with the reviewers and @spapastamkou for carefully guiding the lesson to completion.

@spapastamkou Now we are moving towards the fiddly bit of publication, if you have any specific questions for me about the process, please ask.

spapastamkou commented 5 years ago

Thanks @drjwbaker. I hesitated a bit when adding the topics in the lesson yaml. I opted for APIs and Data manipulation, what do you think? And for info, there is already a branch create-beginners-guide-to-twitter-data where I added the author bios, lesson images and lesson avatar. I'll add the lesson when ready.

drjwbaker commented 5 years ago

Is it also web scraping or - @BCWrit - would that be mischaracterising the lesson approach?

spapastamkou commented 5 years ago

@BCWrit @drjwbaker: Initially I had APIs and Web scraping but replaced the latter with Data manipulation. We can only have two topics, so I'll trust you on that.

edsu commented 5 years ago

For me web scraping implies extracting structured information from HTML, sometimes in combination with web crawling (doing multiple HTTP requests based on discovered links).

drjwbaker commented 5 years ago

Okay. I'm happy to go with that definition and therefore not tag this as web scraping.

BCWrit commented 5 years ago

@spapastamkou @drjwbaker Sorry for the delay: we were on break and I was making a point to unplug! Anyway, I've fixed the two typos and think the selected tags sound good. Let me know if there's anything else that needs my attention. And thank you: it's been a pleasure!

spapastamkou commented 5 years ago

Big thanks to you, @BCWrit, and to all contributors here. It was a pleasure as well:-)

spapastamkou commented 5 years ago

The lesson is now ready to be published. @alsalin and @svmelton, please find below a list of the relevant files in this repository (ph-submissions) as per the editorial checklist:

ph-submissions/lessons/beginners-guide-to-twitter-data.md ph-submissions/images/beginners-guide-to-twitter-data ph-submissions/gallery/originals/beginners-guide-to-twitter-data-original.png ph-submissions/gallery/beginners-guide-to-twitter-data.png

However, as I let you know by e-mail, I already created a branch as I (mistakenly) thought I would do myself the pull request. You will find this branch in the jekyll repo under the title create-beginners-guide-to-twitter-data. I have already moved there the lesson images, the gallery icons and updated the ph-authors.ymlwith the three authors bios (the very last lines of the file). I did not move the lesson .md file though. Otherwise, I have just updated the branch to be ok witht the latest commits in jekyll.

As for the file lesson: some image links did not display the images. I did not find any problems in the syntax and if the problem persisted during the build in travis, I'd ask @mdlincoln or some other form @programminghistorian/technical-team to check it. Please keep this in mind and ping me if necessary in the PR. Thank you both and sorry for the blurred steps between Editor/Managing Editor checklists.

PS: The messages for the twitter bot are ready, please let me know when the lesson is published to add them.

walshbr commented 5 years ago

@spapastamkou if you want to ping me on the pull request when it gets opened I can take a look. I'll wait until the ME does so to keep from messing with the editorial workflow. Will get you sorted!

walshbr commented 5 years ago

And I made a new tab on the Twitter bot spreadsheet for the French twitter messages. You can go ahead and add them to the spreadsheet. The bot only looks at the first two tabs right now, so you should be fine for adding the messages to the French tab - they won't go out until we hit the switch later on. Let me know if you have questions about how to do so. @spapastamkou

spapastamkou commented 5 years ago

Thank you @walshbr. This lesson will go to the EN feed so I wait for the EN ME to confirm publication before adding it. But very glad the FR tab is created, that should not take much longer to feed it!

programminghistorian / ph-submissions