Limit not working + bracket organization missing

DeFiDude commented 4 years ago

I've set a limit to only download a max of 10,000 tweets per user, and the limit shows in the command line when downloading the tweets (x/10000), however after letting it run it's become apparent the limit isn't working, I have over 90,000 tweets when I should only have ~40,000-50,000.

I have it including replies as well if that matters.

Also I noticed in the CSV the tweets are all separated by a blank line, example:

Tweet1

Tweet2

Tweet3

However in the old edition the tweets were separate with brackets, I'm assuming so GPT2 simple can read them properly, like so:

['Tweet1'] ['Tweet2'] ['Tweet3']

sdelgadoc commented 4 years ago

I tested the code, and was not able to reproduce the issue. How are you measuring the number of tweets? If you are using number of lines in the file, this metric won't match to the number of tweets because some tweets have multiple lines.

To measure the number of tweets, read the file using Python's CSV reader. This is the library used by this code to store the tweets to file, and by gpt-2-simple to read input files.

The difference with how tweets are separated was a bug in previous versions in the code. Previously, the code generated tweets strings surrounded by brackets. So, gpt-2 would generate tweets surrounded by brackets, and then the user would need to remove them.

In this version, the code doesn't surround the tweet strings with brackets to avoid having to remove them after gpt-2 generation.

DeFiDude commented 4 years ago

You’re right, duh! I was counting the blank lines as well haha.

Also good to know the brackets aren’t needed, so it should be working as needed (though it seems like it’s taking extremely long compared to previous iterations).

Thanks once again for the help, will close this issue when I get back to my computer.

On Sun, Jun 14, 2020 at 3:49 PM Santiago Delgado notifications@github.com wrote:

I tested the code, and was not able to reproduce the issue. How are you measuring the number of tweets? If you are using number of lines in the file, this metric won't match to the number of tweets because some tweets have multiple lines.

To measure the number of tweets, read the file using Python's CSV reader https://docs.python.org/3/library/csv.html. This is the library used by this code to store the tweets to file, and by gpt-2-simple to read input files.

The difference with how tweets are separated was a bug in previous versions in the code. Previously, the code generated tweets strings surrounded by brackets. So, gpt-2 would generate tweets surrounded by brackets, and then the user would need to remove them.

In this version, the code doesn't surround the tweet strings with brackets to avoid having to remove them after gpt-2 generation.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sdelgadoc/download-tweets-ai-text-gen-plus/issues/2#issuecomment-643820300, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOD6IXQL5WQM5WKH3D7LOL3RWUZWZANCNFSM4N5TLXOQ .

sdelgadoc commented 4 years ago

For future reference on performance, I am getting 8.7 iterations (tweets) per second when I download all tweets from my Twitter account

DeFiDude commented 4 years ago

Hm, that’s close to what I use to receive (above 5) however now I’m getting ~1.5/s, and for what it’s worth 4 of the 7 accounts I’ve exported were exactly 1.54s/it, (another 1.59).

Wonder what’s limiting me, let alone at such a static rate it seems..

On Sun, Jun 14, 2020 at 4:24 PM Santiago Delgado notifications@github.com wrote:

For future reference on performance, I am getting 8.7 iterations (tweets) per second when I download all tweets from my Twitter account

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/sdelgadoc/download-tweets-ai-text-gen-plus/issues/2#issuecomment-643824108, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOD6IXXXAJGSWKT6QKC4IDDRWU5X7ANCNFSM4N5TLXOQ .

sdelgadoc / download-tweets-ai-text-gen-plus

Limit not working + bracket organization missing #2