gwu-libraries / sfm-ui

Social Feed Manager user interface application.
http://gwu-libraries.github.io/sfm-ui
MIT License
153 stars 25 forks source link

Fix links to Twitter docs #897

Closed justinlittman closed 6 years ago

justinlittman commented 6 years ago

With the change in Twitter documentation, some of the links within the app and our own docs are broken.

justinlittman commented 6 years ago
docs/collections.rst:<https://dev.twitter.com/rest/reference/get/statuses/user_timeline>`_.
docs/collections.rst:the `Twitter Search API <https://dev.twitter.com/rest/public/search>`_.
docs/collections.rst:<https://dev.twitter.com/rest/public/search>`_, or you can construct a query
docs/collections.rst:<https://dev.twitter.com/streaming/reference/get/statuses/sample>`_, useful for
docs/collections.rst:<https://dev.twitter.com/streaming/reference/post/statuses/filter>`_. Because
docs/collections.rst:<https://dev.twitter.com/streaming/overview/request-parameters#track>`_ for more
docs/collections.rst:<https://dev.twitter.com/streaming/overview/request-parameters#follow>`_ for
docs/collections.rst:<https://dev.twitter.com/streaming/overview/request-parameters#location>`_ for
docs/data_dictionary.rst:<https://dev.twitter.com/docs>`_, including `Tweets
docs/data_dictionary.rst:<https://dev.twitter.com/docs/platform-objects/tweets>`_ and `Entities
docs/data_dictionary.rst:<https://dev.twitter.com/docs/platform-objects/entities>`_.
docs/data_dictionary.rst:<https://web.archive.org/web/*/https://dev.twitter.com/docs>`_, `Tweets
docs/data_dictionary.rst:<https://web.archive.org/web/*/https://dev.twitter.com/overview/api/tweets>`_,
docs/data_dictionary.rst:<https://web.archive.org/web/*/https://dev.twitter.com/overview/api/tweets>`_.
docs/messaging_spec.rst:  terminated. A harvest of a `Twitter public stream <https://dev.twitter.com/streaming/public>`_
docs/userguide.rst:(`Twitter <https://dev.twitter.com/overview/terms/policy>`_,
justinlittman commented 6 years ago
sfm/ui/forms.py:                            help_text='See <a href="https://dev.twitter.com/rest/public/search" target="_blank">' \
sfm/ui/forms.py:                            target="_blank" href="https://dev.twitter.com/streaming/overview/request-parameters#track">
sfm/ui/forms.py:                             href="https://dev.twitter.com/streaming/overview/request-parameters#follow"> follow</a> 
sfm/ui/forms.py:                                href="https://dev.twitter.com/streaming/overview/request-parameters#locations">
justinlittman commented 6 years ago
sfm/ui/templates/ui/terms_snippet.html:    application (<a href="https://dev.twitter.com/overview/terms/policy" target="_blank">Twitter</a>, <a href="https://www.flickr.com/services/developer" target="_blank">Flickr</a>, <a href="http://open.weibo.com/wiki/%E9%A6%96%E9%A1%B5" target="_blank">Sina Weibo</a>, <a href="https://www.tumblr.com/docs/en/api_agreement" target="_blank">Tumblr</a>). </p>
justinlittman commented 6 years ago
./_posts/2015-10-28-social-media-harvesting-techniques.md:Many social media platforms provide APIs to allow retrieval of social media records. Examples of such APIs include the [Twitter REST API](https://dev.twitter.com/rest/public), the [Flickr API](https://www.flickr.com/services/api/), and the [Tumblr API](https://www.tumblr.com/docs/en/api/v2). These APIs use HTTP as the communications protocol and provide the records in a machine readable formats such as JSON. Compared to harvesting HTML from the social media platform’s website, harvesting social media from APIs offers some advantages:
./_posts/2015-10-28-social-media-harvesting-techniques.md:(The one exception worth noting is [Twitter’s Streaming APIs](https://dev.twitter.com/streaming/overview). While these APIs do use HTTP, the HTTP connection is kept open while additional data is added to the HTTP response over a long period of time. Thus, this API is unique in that the HTTP response may last for minutes, hours, or days rather than the normal milliseconds or seconds and the HTTP response may be significantly larger in size than the typical HTTP response from a social media API. This will require special handling and is outside the scope for this discussion, though ultimately requires consideration.)
./_posts/2015-12-15-harvesting-twitter-streams.md:The Twitter Streaming API is very powerful, allowing harvesting tweets not readily available from the other APIs. However, recall from our previous post that the Twitter Streaming API does not behave like REST APIs that are typical of social media platforms -- see Twitter’s [description of the differences](https://dev.twitter.com/streaming/overview). A single HTTP response is potentially huge and may be collected over the course of hours, days, or weeks. This is a poor fit for both the normal web harvesting model in which a single HTTP response is recorded as a single WARC response record in a single WARC file, and for most web archiving tools, which store HTTP responses in-memory and don’t write them to the WARC file until the response is completed.
./_posts/2016-05-04-harvesting-twitter-streams2.md:In ["Harvesting the Twitter Streaming API to WARC files"](http://gwu-libraries.github.io/sfm-ui/posts/2015-12-15-harvesting-twitter-streams), I described an approach for recording the [Twitter Streaming API](https://dev.twitter.com/streaming/overview) in WARC files using [record segmentation](http://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.0/#record-segmentation).  The motivation for using record segmentation was that it allowed splitting up a single call to the API — a call that might have a very long duraWe just abandoned that approach.  Here’s why:o multiple WARC records spread across multiple WARC files.
./_posts/2016-05-04-harvesting-twitter-streams2.md:* Exports seemed potentially problematic as well.  Exporting required reconstructing and reading through monster HTTP responses.  This was particu* It has become increasingly clear that data collected from the Twitter Streaming API MUST be considered a sample.  Some of the existing reasons for this are [rate limits in the Twitter Streaming API](https://dev.twitter.com/streaming/reference/post/statuses/filter), inevitable network hiccups or similar operational ailments that will interrupt the stream, and the simple fact that the Twitter Streaming API is a “black box” whose exact operation is unknown (well, to us anyway).  If the data collected must be considered a sample, then small interruptions in the harvest should be acceptable as long as they don’t introduce any sort of a sampling bias.  Researchers requiring a complete dataset will probably want to purchase it from a data reseller like [Gnip](https://gnip.com/sourceGiven this, we’re trying a new approach:  Harvest from the Twitter Streaming API for 30 minutes at a time.  At the end of the 30 minutes, close the stream and start a new one.  Each 30 minute segmeTwitter [warns against connection churn](https://dev.twitter.com/streaming/overview/connecting):  "Clients which break a connection and then reconnect frequently (to change query parameters, for exThe upside of this new approach is that each WARC response record is a more manageable size that should play well with existing web archiving tools and be more export friendly.  Oh yeah – and I get to throw away a ton of code.
./_posts/2016-09-07-collection-not-an-archive.md:Twitter's [Developer Policy](https://dev.twitter.com/overview/terms/policy)
./_posts/2016-09-07-collection-not-an-archive.md:Twitter")](https://dev.twitter.com/overview/terms/policy#6.Update_Be_a_Good_Partner_to_Twitter),
./_posts/2016-09-07-collection-not-an-archive.md:Products")](https://dev.twitter.com/overview/terms/policy#2.Update_Maintain_the_Integrity_of_Twitter%E2%80%99s),
./_posts/2016-09-07-collection-not-an-archive.md:Privacy")](https://dev.twitter.com/overview/terms/policy#3.Update_Respect_Users_Control_and_Privacy)
./_posts/2016-10-12-harvester-anatomy.md:* Better handling of harvest errors.  In particular, we observed cases where Twarc would get stuck in a retry loop on odd network problems (DNS, certificate) that don’t resolve.  ([Twarc](https://github.com/edsu/twarc) is the social media client that SFM uses to access the [Twitter APIs](https://dev.twitter.com/overview/documentation).)
./_posts/2016-10-12-harvester-anatomy.md:Let’s start with how harvesters used to work.  (For the purposes of simplifying the description, I’m going to omit the handling of [Twitter stream harvests](https://dev.twitter.com/streaming/overview).)
./_posts/2016-11-10-twitter-interaction.md:We will not discuss affordances of the Twitter API that are perspectival, that is, depend on the Twitter account that is used to access the API. So, for example, we will not consider [GET statuses/retweets_of_me](https://dev.twitter.com/rest/reference/get/statuses/retweets_of_me).
./_posts/2016-11-10-twitter-interaction.md:Tweets retrieved from the Twitter API are in [JSON](http://json.org/), a simple structured text format. Below I will provide the entire tweet; in the rest of this notebook I will only provide a subset of the tweet containing the relevant fields. Twitter provides [documentation on the complete set of fields in a tweet](https://dev.twitter.com/overview/api/tweets).
./_posts/2016-11-10-twitter-interaction.md:[GET statuses/show/:id](https://dev.twitter.com/rest/reference/get/statuses/show/id) is used to retrieve a single tweet by tweet id. [GET statuses/lookup](https://dev.twitter.com/rest/reference/get/statuses/lookup) is used to retrieve multiple tweets by tweet ids.
./_posts/2016-11-10-twitter-interaction.md:[GET statuses/user_timeline](https://dev.twitter.com/rest/reference/get/statuses/user_timeline) retrieves a user timeline given a screen name or user id. This is one of the primary methods for collecting social media data.
./_posts/2016-11-10-twitter-interaction.md:[GET statuses/retweets/:id](https://dev.twitter.com/rest/reference/get/statuses/retweets/id) returns the most recent retweets for a tweet. Only the most recent 100 retweets are available.
./_posts/2016-11-10-twitter-interaction.md:[GET statuses/retweeters/ids](https://dev.twitter.com/rest/reference/get/statuses/retweeters/ids) retrieves the user ids that retweeted a tweet.
./_posts/2016-11-10-twitter-interaction.md:[GET search/tweets](https://dev.twitter.com/rest/reference/get/search/tweets) (also known as the [Twitter Search API](https://dev.twitter.com/rest/public/search)) allows searching "against a sampling of recent Tweets published in the past 7 days."
./_posts/2016-11-10-twitter-interaction.md:[POST statuses/filter](https://dev.twitter.com/streaming/reference/post/statuses/filter) allows filtering of the stream of tweets on the Twitter platform by keywords ([track](https://dev.twitter.com/streaming/overview/request-parameters#track)), users ([follow](https://dev.twitter.com/streaming/overview/request-parameters#follow)), and geolocation ([location](https://dev.twitter.com/streaming/overview/request-parameters#location)).
./_posts/2016-12-21-releasing-1-4.md:* Harvests take an unexpectedly long time to complete because they have to pause due to [Twitter’s rate limit](https://dev.twitter.com/rest/public/rate-limiting).  This can be because a single harvest has a large number of seeds or because there are multiple harvests being performed concurrently that are using the same API keys.
./_posts/2017-03-15-releasing-datasets.md:<sup>1</sup> As background:  Twitter’s [Developer Policy](https://dev.twitter.com/overview/terms/agreement-and-policy#id8) only permits public datasets to include tweet ids. Publishing the full JSON for tweets is prohibited. As has been [pointed out](http://dl.acm.org/citation.cfm?doid=2908131.2908172), this is problematic for quality, reproducible research.
./_posts/2017-03-31-extended-tweets.md:Twitter followed this up a few weeks ago with some [additional details](https://dev.twitter.com/overview/api/upcoming-changes-to-tweets) on changes this would entail in the Twitter API. Yesterday (March 30), the [first of these changes went live](https://blog.twitter.com/2017/now-on-twitter-140-characters-for-your-replies) on the Twitter website.  These changes impact applications like Social Feed Manager that collect social media data.  The goal of this blog post is to explore the salient changes.
./_posts/2017-03-31-extended-tweets.md:The new features will impact the [REST API](https://dev.twitter.com/rest/public) and the [Streaming API](https://dev.twitter.com/streaming/overview) differently.
./_posts/2017-04-12-geographic-collecting.md:Twitter [Search](https://dev.twitter.com/rest/public/search
./_posts/2017-04-12-geographic-collecting.md:"Twitter Search docs") and [Filter](https://dev.twitter.com/streaming/reference/post/statuses/filter
./_posts/2017-04-12-geographic-collecting.md:see Twitter documentation about [Place](https://dev.twitter.com/overview/api/places
./_posts/2017-05-18-twitter-policy-change.md:Because of the prominent role that Twitter plays in the social, political, and cultural discourse of contemporary society, activity on Twitter has increasingly become the subject of research across a wide array of disciplines and the focus of collecting by archival organizations concerned with preserving the historical record. On May 17 Twitter announced a changed in their [Developer Policy](https://dev.twitter.com/overview/terms/agreement-and-policy) to go in effect on June 18 that significantly impacts both of those activities. The goal of this blog post is to describe the change and its implications.
./_posts/2017-05-18-twitter-policy-change.md:For both research and archival purposes, the primary mechanism for collecting Twitter data are [Twitter’s APIs]( https://dev.twitter.com/overview/api). Twitter’s APIs support the efficient collection of large numbers of tweets in a format that is amenable to computational analysis and archiving. Collecting Twitter data is governed by the technical affordances of the API and the Developer Policy.
./_posts/2017-05-18-twitter-policy-change.md:Thus, researchers and archivists were required to share datasets of tweet ids, rather than the tweets themselves. Twitter’s API allows retrieving a tweet from the tweet id (known as "hydrating" a tweet). While [rate limits](https://dev.twitter.com/rest/public/rate-limiting) on Twitter’s API make this is a slow process, there are tools to make it easy (e.g., [Hydrator](https://github.com/DocNow/hydrator)). One aspect of this approach is that if a tweet has been deleted or protected, it cannot not be retrieved by tweet id. While making for imperfect sharing of datasets, this gave authors some measure of control over their tweets.
./_posts/2017-09-14-twitter-data.md:While you can write your own software for accessing the [Twitter API](https://dev.twitter.com/rest/public), a number of tools already exist. They are quite varied in their capabilities and require different levels of technical skills and infrastructure. These include:
./_posts/2017-09-14-twitter-data.md:Twitter’s [Developer Policy](https://dev.twitter.com/overview/terms/agreement-and-policy) (which you agree to when you get keys for the Twitter API) places limits on the sharing of datasets. If you are sharing datasets of tweets, you can only publicly share the ids of the tweets, not the tweets themselves. Another party that wants to use the dataset has to retrieve the complete tweet from the Twitter API based on the tweet id (“hydrating”). Any tweets which have been deleted or become protected will not be available.
justinlittman commented 6 years ago

Sigh.

justinlittman commented 6 years ago

The Twitter documentation is currently too broken to do this successfully. Will try again in a week.

justinlittman commented 6 years ago

See also https://github.com/gwu-libraries/sfm-ui/commit/60e432e4b48c1148123f35d4b4eb9fa565c89ca3 and https://github.com/gwu-libraries/sfm-ui/commit/e44cbd31847d566d974811bfa92e849e68392fa7.

Will update blog posts at a later date.