gwu-libraries / TweetSets

Service for creating Twitter datasets for research and archiving.
MIT License
25 stars 2 forks source link

Update Spark schema for other kinds of tweets #136

Closed dolsysmith closed 2 years ago

dolsysmith commented 3 years ago

The current schema is derived from 40G of Tweets from the Brexit collection in SFM. But not all fields are present in every kind of tweet (for instance, the user elements are richer in user timeline tweets than in search/filter tweets).

Using additional samples from SFM, create a union schema that will cover the various possible kinds of collections we ingest.