gwu-libraries / TweetSets

Service for creating Twitter datasets for research and archiving.
MIT License
26 stars 2 forks source link

Add language parameter to subsetting options #39

Closed kerchner closed 3 years ago

kerchner commented 4 years ago

On /dataset "Select tweets in dataset" page

kerchner commented 3 years ago

Noting that 'lang' is not being indexed in https://github.com/gwu-libraries/TweetSets/blob/db2f5982a2d541b501dd342dd7805dd1af3dde3f/models.py#L76-L120 or referenced in https://github.com/gwu-libraries/TweetSets/blob/db2f5982a2d541b501dd342dd7805dd1af3dde3f/models.py#L45-L73 we see that 'lang' is not being indexed upon loading a new dataset and indexing it. 'user_language'/['user']['lang'] is being indexed, but I think we want 'lang'.

We would need to reload/reindex ALL of the data sets in order to enable a "language" parameter on the dataset page to filter on lang. So, for now, we are doing the following:

  1. Add 'lang' in models.py so that newly loaded/indexed sets do contain this field.
  2. Create a new issue (#114) to implement the new UI and back-end functionality to filter results on 'lang'. This would be dependent on datasets in the system being indexed with 'lang'.