Open pbabvey opened 5 years ago
replies.Resume
needs to be a file name. Like replies.Resume = 'resume_file.txt'
Thank you. I tried it. Now, it works, but it resumes collecting tweet from 4296th tweet of the file, while my JSON file contains almost 71000 tweets. Is there any limit on the number of items in a file?
when I did g.Resume = 'filename.csv' and twint.run.Search(g), why i got UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 7965: character maps to
@pbabvey every request has its own resume ID. For every request, first the resume ID is placed in the resume file and then the request is handled, so if something breaks you still have the ID of the request to resume
At every new request, the old id is deleted and replaced with the new one
@epanmareza it seems that you were not able to decode a char, might be an issue on your end
Apologies if this is a dumb question, but is the "resume" file the twint-last-request.log file saved by the debugger? And if not, how do I find or create a resume file?
Edit: Ah, I think I figured it out - you specify the filename in c.Resume =
@jomorrcode
import twint
c = twint.Config()
c.Username = "target"
c.Limit = 20
c.Resume = "target_resume.raw"
twint.run.Search(c)
Now if you run this script twice, you'll resume from where Twint stopped (assuming that there are more tweets to scrape)
I need to resume from the last downloaded data but I couldnt do that.
I read that it needs to add last scrol_id to >>> config.resume
found this one in url log file: scroll%3AthGAVUV0VFVBaEgLeJ1_GV7yQWgsC79bHLzpglEjUAFQAlABEV3Lp5FYCJehgHREVGQVVMVBUAFQAVARUIFQAA from the above code I used this part : thGAVUV0VFVBaEgLeJ1_GV7yQWgsC79bHLzpglEjUAFQAlABEV3Lp5FYCJehgHREVGQVVMVBUAFQAVARUIFQAA
Here is the full code:
import twint
c = twint.Config()
c.Search = "gold"
c.Store_csv = "True"
c.Output = "none.csv"
c.Lang = "en"
c.Debug = "True"
twint.run.Search(c)
After the error ocurred I add this line of code : c.Resume = "thGAVUV0VFVBaEgLeJ1_GV7yQWgsC79bHLzpglEjUAFQAlABEV3Lp5FYCJehgHREVGQVVMVBUAFQAVARUIFQAA" but the program start downloading the whole data from now .
the question is that : Is the scrol_id which i used correct? or it needs to be use in other format?
Thanks in advanced
mnwato
I found that it needs to set a file as input for Resume and every time it will update the last id
But I have a suggestion:
because there are lots of tweets which sent per seconds so it will be great if "Since" and "Until" contains datetime not just date. If there is now avaible please tell me.
Thank for all of you for this great project
pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint
;Description of Issue
Here is my code:
It's impossible to play with date, unless we have the option to give more specific time to Since filed to start from where we left off.
Environment Details