tweetset_loader looks at all files in the folder and simply counts lines in the files and produces a message at the console such as:
INFO:__main__:Counting tweets in 34 files.
INFO:__main__:191,631 total tweets
Following our documentation for loading to tweetsets results in the creation of other files in the folder that should not be counted, such as files containing concatenated contents from all of the tweet ID files, etc. - the result being that tweetset_loader counts lines in more files than necessary, leading to a wildly inaccurate tweet count.
Since this is a back-end function, I would suggest simply making the message less specific, rather than spending effort to make it more precise. This will at least avoid creating the appearance to the person invoking the load that something isn't correct.
tweetset_loader
looks at all files in the folder and simply counts lines in the files and produces a message at the console such as:Following our documentation for loading to tweetsets results in the creation of other files in the folder that should not be counted, such as files containing concatenated contents from all of the tweet ID files, etc. - the result being that
tweetset_loader
counts lines in more files than necessary, leading to a wildly inaccurate tweet count.Relevant code is here: https://github.com/gwu-libraries/TweetSets/blob/master/tweetset_loader.py#L319-L322
Since this is a back-end function, I would suggest simply making the message less specific, rather than spending effort to make it more precise. This will at least avoid creating the appearance to the person invoking the load that something isn't correct.