twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.75k stars 2.72k forks source link

[REQUEST] New lines in tweet text stripped #534

Closed edsu closed 4 years ago

edsu commented 5 years ago

Issue Template

Initial Check

Command Ran


import twint

config = twint.Config()
config.Search = '"First language" AND "Most used" AND "Most loved"'
config.Store_json = True
config.Output = "data/languages.json"
config.Hide_output = True

twint.run.Search(config)

Description of Issue

It looks like twint strips new lines from tweet text. JSON and CSV are both capable of containing newlines. New lines can sometimes be significant when you are analyzing tweets: for example like when parsing these tweets.

I was curious what they are being stripped out.

Environment Details

OS X (Mojave 10.14.6)

Qriist commented 4 years ago

I just confirmed that I'm having the same issue. Latest Python/Twint under Windows Server 2012.

pielco11 commented 4 years ago

I don't know why \n are stripped out, I did not cover that part. Anyway I think it's better to not strip them out. The output might not be clean, and 'raw' saving (not to CSV or JSON) might not be really handy and cool.

So in the cases where c.Store_csv and c.Store_json are not specified, \ns are stripped out

Pushing updates right now

edsu commented 4 years ago

Thanks so much @pielco11!