gwu-libraries / TweetSets

Service for creating Twitter datasets for research and archiving.
MIT License
25 stars 2 forks source link

T136 update schema Fixes #136 #151

Closed dolsysmith closed 2 years ago

dolsysmith commented 3 years ago

Updates the schema to handle the user.entities.url.urls element, when present (user timeline tweets). This element is output in the CSV extract as user_urls.

Test with user timeline collections and other kinds of collections, if possible.

lwrubel commented 2 years ago

Tested with user timeline, search, and filter collections. Getting user URLs in all but filter tweets.

Noting here that in the filter stream tweets, there are URLs in recent tweets in the user.url field rather than user.entitites.url.urls where Twarc is looking for them (for json2csv). Agreed we will move forward with this PR as is.