hirmeos / altmetrics

Implementation of HIRMEOS WP6
MIT License
5 stars 0 forks source link

Twitter client does not pick up URLs that are tweeted #46

Open rowan08 opened 5 years ago

rowan08 commented 5 years ago

Received a message querying a book tweet that was not saved by the Altmetrics service: • "We found a tweet about one of our books but the API doesn’t show it: https://twitter.com/OpenEditionNews/status/1140910422431797250?s=20"

After some investigation, this tweet was not picked up because it references the book in question by its URL (books.openedition.org/oep/8999), yet we only search for books based on DOI. After adding the URL to the Twitter search, it was still unable to find the tweet. The following combinations were tested:

DOI and all URLs: keywords = ['"https://books.openedition.org/oep/8999"', '"https://books.openedition.org/oep/pdf/8999"', '"https://books.openedition.org/oep/epub/8999"', '"10.4000/books.oep.8999"']
Only the relevant URL Keywords = ['"https://books.openedition.org/oep/8999"']

Only the relevant URL without 'https://' Keywords = ['"books.openedition.org/oep/8999"']

None of these searches with the Twitter client returned anything. Ideally, we should be able to search for tweets about a book that is mentioned by its URL, not just its DOI.

rowan08 commented 5 years ago

Update; The problem seems to be because twitter converts all tweeted urls to a tiny url; https://help.twitter.com/en/using-twitter/how-to-tweet-a-link; so the original URL does not form part of the tweet text, so will not be matched when searching tweets using the URL as a keyword.

Moreover, the URLs are not predictable, not unique (i.e. there can be more than 1 timy URL for a given expanded URL).

rowan08 commented 5 years ago

It looks like twitter does allow you to search for the expanded URLs, by prefixing a keyword with 'url:'. I have tested this and had limited success. It does, however, work with some URLs at least, so we should probably include it.

Based on: https://stackoverflow.com/questions/3584482/how-to-find-tweets-that-contain-a-url

Just a note: Searching with the https:// prefix seems to work fine; but combining more than one URL in a search tends to return nothing. i.e., the following will work: (at time of writing)

Keywords = ['url:books.openedition.org/oep/9068']  # or 'url:https://books.openedition.org/oep/9068'

but the following will not work:

Keywords = [
    'url:http://books.openedition.org/oep/9068',
    'url:https://books.openedition.org/oep/9068',
]
rowan08 commented 5 years ago

Worth noting: After some testing, it looks like the keywords above act as a DB filter, i.e. we will need a separate search for DOIs and URLs.

This may require a change in search strategy, considering the twitter rate limit.

yoannspace commented 4 years ago

@rowan08 Is this fair to say it was solved? The Twitter example is available on the API now.