sferik / t-ruby

A command-line power tool for Twitter.
http://sferik.github.com/t
MIT License
5.43k stars 410 forks source link

tweets with line breaks or new lines #228

Open jbabbin opened 10 years ago

jbabbin commented 10 years ago

Parsing out my streaming timeline I've noticed that some tweets have new lines (\n) or carriage returns inside of the tweet field. Can a flag or option be placed inside 't' to remove any new lines or carriage returns or line breaks from the tweet message field itself?

Here's an example (executing as 't stream timeline -c |' )

ORIGINAL TWEET: https://twitter.com/qualys/status/517738519104454656

I'm splitting on the comma's for fields and when I get a field length of 1 I know that the previous line is likely to be a count of 4 and was split on a line break or new line from the streaming data.

ERRORLINE DEBUG Size [5] DATA [517738519104454656,2014-10-02 18:10:56 +0000,qualys,"SANS @ RISK week 39, 2014: Consensus Security Vuln Alert w/ analysis of latest vulns & remediation advice https://t.co/tT3n25DIEb"] RAWDATA [517738519104454656,2014-10-02 18:10:56 +0000,qualys,"SANS @ RISK week 39, 2014: Consensus Security Vuln Alert w/ analysis of latest vulns & remediation advice https://t.co/tT3n25DIEb"]

ERRORLINE DEBUG Size [1] DATA [http://t.co/LYNdHgjCEm http://t.co/GSI63qI7tc"] RAWDATA [http://t.co/LYNdHgjCEm http://t.co/GSI63qI7tc"]

jbabbin commented 10 years ago

Looking maybe in file stream.rb line 172 maybe manipulating 'tweet.txt' with the 'delete' function for new lines might work.

print_message(tweet.user.screen_name, tweet.text) change print_message(tweet.user.screen_name, tweet.text).delete!("\n") or tweet.text.delete!("\n") print_message(tweet.user.screen_name,tweet.text)

jbabbin commented 9 years ago

I think I found a solution using Python as a wrapper.

Execute: python this_file.py ------ BEGIN ------- import subprocess with open('pytest.log', 'w') as f: process = subprocess.Popen(["t","stream", "timeline","-c"], stdout=subprocess.PIPE) for line in iter(process.stdout.readline, ''):

RAW LINE

            #sys.stdout.write(line)
            # parse line 
            whole_split_line = line.split(",")
            # get splits 
            line_count = len(whole_split_line)
            #print "Split Count ", line_count, " for RAW" , f.write(line)
            # Parsing of array 
            Twitter_ID = whole_split_line.pop(0)
            Posted_At = whole_split_line.pop(0)
            Screen_Name = whole_split_line.pop(0)
            Tweet_Text = whole_split_line
            print "START \n PARSED - ID ", Twitter_ID, " Posted Time ", Posted_At, "Twitter Name", Screen_Name, " Tweet ", Tweet_Text, "\n RAW", line, "\n END" 
            #WRITE TO LOG FILE 
            f.write(line)

# #

Done

------- END ---------

dannguyen commented 9 years ago

You can get by with using a command line CSV parser like csvkit

e.g.

# get the first four columns of the followers listing as CSV
t followers ev --csv | csvcut -c 1,2,3,4

csvcut and its other utilities handle the line-breaks and such (by using Python for the parsing).

Maybe it'd be a useful feature to have a flag in which the t library itself strips out trailing spaces and newlines?

giacecco commented 8 years ago

The same problem affects the followers command, when the user profile's text has one or more \n's. E.g. at the moment of writing, Twitter user "AlgebraWinter" is listed on two lines of output instead than one, as:

$ t followers dicoim -l --profile ~/.config/t/personal
(...)
         613514515  Jun 20  2012  Aug 17  2015     8737    587    17     742      314  @AlgebraWinter    Matt Hart             No   No   Software developer at 1E, Ealing. http://t.co/1zkihd68Ve Author of 'Algebra Winter'. 
 Robot avatar © Julien Tromeur. 
(...)

This also makes the output not machine readable, that defies the purpose of the -l option in the first place!

There's no obvious solution, as -l is intended to output fixed-width fields and changing its behaviour may break other people's scripts relying on that. My suggestion is that a -c option is added, outputting valid CSV, that I believe supports \n's.

Animenosekai commented 4 years ago

@jbabbin Btw you can use ``` to write code into your GitHub reply. Example:

print('Hello World')
githubbbie commented 3 years ago

agree this needs to be fixed