humandecoded / twayback

Automate downloading archived deleted Tweets.
Apache License 2.0
178 stars 23 forks source link

Twayback Partial Re-Write #10

Closed AccentuSoft closed 2 years ago

AccentuSoft commented 2 years ago

Hello,

We recently came across your project, and thought we could contribute by re-writing some parts of the code. An effort was made to keep the logic and structure of the code the same.

By our metrics, we have achieved a speed-up of around 20-30% for accounts with over 1000 tweets. We don't have any actual tests written for the script however, so we have attached it here for review.

A substantial change was moving from Selenium to Playwright. Users need to run 'playwright install' to install the playwright browsers before running the script.

Another smaller, but notable change is that due to the nature of the as_completed function, it is not possible to show any indicator of progress during the gathering of statuses, and the order in which the tweets are returned is scrambled due to the script processing website information based on what web request finishes first.

If the style & format of the re-write is acceptable, and the changes proposed are not considered to be critical, we will proceed with creating a pull request.

twayback2.txt

Mennaruuk commented 2 years ago

Hey @AccentuSoft ! Thank you ever so much for this. I truly appreciate it. You guys are really awesome.

In fact, if you are interested in taking over this project, or know somebody who wants to, I'm more than happy to transfer it over. I believe there is so much potential to be tapped with somebody who knows a lot more than someone like me who is struggling with the basics. Like I had no idea about things like async, Puppeteer, or Playwright, and don't think I'll get much further with them. I'm totally okay with whatever's on your mind, I'll try my best to continue developing with the knowledge that I have, but I'm extremely novice at Python (this was my first project), and a more knowledgeable leadership can take this project to new heights, and I want to see it help as much as possible.

Please make a PR, that way it'll be in your name!!

AccentuSoft commented 2 years ago

Thank you for the kind words!

We appreciate the offer, however we feel that the project would be best managed by you; after all, this whole operation was your idea :)

Don't be discouraged by the fact that you're not familiar with technologies. Knowledge comes with experience, and experience comes with practice. As you build your Github portfolio, I'm sure you'll become acquainted with a wide variety of tools. Everyone starts from somewhere, and having all your projects under your profile is a great way to show people your accomplishments. Your first project was even noticed by humandecoded, who is a veteran of the OSINT industry. I'm sure you don't want us to take this away from you!

Don't stress about what you feel like you should or should not do; the tool is already great! Anything you add on the tool from here on out is a bonus. As long as you feel like you are learning, you really can't go wrong.

We made a PR with the suggested changes (#11) . Please review our French, as google translate sometimes misses the nuances of language.

Let us know if you come across any issues!

Mennaruuk commented 2 years ago

Thank you for your response. I appreciate your response, and I understand your opinion. Your compliment means a lot!

I merged the PR. I did face a ClosedPoolError and ConnectionError. (Console log.) I'm not sure why. I'll look around for clues on how to fix this, when it's fixed I'll publish a new release. Thanks for your help again, it means so much 💛

AccentuSoft commented 2 years ago

Those issues have to do with the rate that requests are being sent; we think that twitter is rejecting or rate-limiting the traffic.

We found that reducing the session threads to about 10 means that no issues are faced for the most part, though it means that the script moves slower.

We opted to cancel and re-run the script when we came across those errors, which seemed to work.

Mennaruuk commented 2 years ago

Sorry for the late response. My only gripe is that the error keeps happening, it's pretty frustrating! I don't know why, scratching my head. It seems to work for a small number of Tweets, but not large numbers. Let me ask a couple people around and see if I can get a fix in. I appreciate your hard work, I don't want it to go to waste. 💛