twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.69k stars 2.71k forks source link

[ISSUE]Multiple follows_list in a program is returning the same list everytime despite using clean_follows_list() everytime #653

Closed rjsu26 closed 4 years ago

rjsu26 commented 4 years ago

Make sure you've checked the following:

Command Ran

Please provide the exact command ran including the username/search/code so I may reproduce the issue.


import twint
import json
import time

i = 0 dic = {} c = twint.Config() while i<2: i+= 1 c.Username = "rajsahuofficial" c.Hide_output = True c.Store_object = True

while True: try: twint.run.Lookup(c) break except Exception as e: print(e) print("Retrying in 1 sec") time.sleep(1) fol = twint.output.users_list[:] dic["followers"] = fol[0].followers dic["following"] = fol[0].following while True: try: twint.run.Followers(c) break except Exception as e: print(e) print("Retrying in 1 sec") time.sleep(1)

lst = twint.output.follows_list[:] dic["followers_list"] = lst twint.output.clean_follow_list()

while True: try: twint.run.Following(c) break except Exception as e: print(e) print("Retrying in 1 sec") time.sleep(1)

lst1 = twint.output.follows_list[:] dic["following_list"] = lst1 twint.output.clean_follow_list() print(dic)


### Output of the above code
I have run the fetch command for the same user 2 times to show the error. In the debugger, I watched the dic(profile dictionary) for the actual followers and following count. Then I took the length of dic["followers_list"] and that for following_list at both the iterations.

For i=0
![image](https://user-images.githubusercontent.com/32229344/73717614-56534c00-4740-11ea-83e7-362848bcddde.png)
![image](https://user-images.githubusercontent.com/32229344/73717621-5d7a5a00-4740-11ea-8eeb-e430b38cf54d.png)
Here, Number of followers : 53, number of following: 97
So len(dic["followers_list"]) = 53
But len(dic["following_list"]) = 150, probably because last list is appended by new output.

For i=1
![image](https://user-images.githubusercontent.com/32229344/73717829-de395600-4740-11ea-84ab-d978d548f8ce.png)
![image](https://user-images.githubusercontent.com/32229344/73717838-e7c2be00-4740-11ea-8a1a-3fd95a1e1073.png)
Since in last iteration, the final list size was 150, and num of followers = 53. they both are getting added this time to give 203 as new size.
Again in the case of following_list, since latest size was 203 and number of following = 97, they both get added to give new list as 300. 
Hence, at i=1, following list becomes of size 300.

### Description of Issue
I am trying to build a complete profile for a list of users. As a start, I selected a user randomly, and tried to get his/he users_list, and both the follows_list(one for followers and other for following) in the same iteration. 
Once I call twint.run.Followers(c) and save it to my dictionary, and then calling twint.run.Following(c) and doing the same process, I am getting a sum total of all followers and following as the list indexed to my dictionary. 

### Environment Details
Ubuntu 18.04.3 LTS
Running on : VsCode Bash

### Help needed

How to generate the correct list in the same program in one iteration itself. 
rjsu26 commented 4 years ago

@pielco11 can u please help?

pielco11 commented 4 years ago

@rjsu26 quite busy these days

New code is being pushed. clean_follow_list was used to clean (objects) used by Pandas' DataFrames.

I renamed that function to a "private" one, _clean_follow_list. If you are not doing an advanced usage of Twint and so using core functions, you should not use it. To clean lists you may want to do twint.output.users_list = [], I added a function called twint.output.clean_lists to reset users_list,tweets_list and follows_list.

rjsu26 commented 4 years ago

So, we now need to use both of the new features of any one of them?

pielco11 commented 4 years ago

No, if you want to clean the lists, each of them, just call twint.output.clean_lists()

rjsu26 commented 4 years ago

That worked like a charm. Thanks for the immediate correction @pielco11

Also, can you reason out why even though the lists must be getting fetched in batches of 20, my resume files are of many many lines for even small number of followers/followings.

Example: for following count of 20, there are about 25 lines in resume.raw file being generated. While for 700 800 followings, its done in 30 lines max.

pielco11 commented 4 years ago

Probably Twint retries the requests

rjsu26 commented 4 years ago

Yes, It was somewhat that you pointed about. Thanks for your help @pielco11 .