Open jbabbin opened 10 years ago
Looking maybe in file stream.rb line 172 maybe manipulating 'tweet.txt' with the 'delete' function for new lines might work.
print_message(tweet.user.screen_name, tweet.text) change print_message(tweet.user.screen_name, tweet.text).delete!("\n") or tweet.text.delete!("\n") print_message(tweet.user.screen_name,tweet.text)
I think I found a solution using Python as a wrapper.
Execute: python this_file.py ------ BEGIN ------- import subprocess with open('pytest.log', 'w') as f: process = subprocess.Popen(["t","stream", "timeline","-c"], stdout=subprocess.PIPE) for line in iter(process.stdout.readline, ''):
#sys.stdout.write(line)
# parse line
whole_split_line = line.split(",")
# get splits
line_count = len(whole_split_line)
#print "Split Count ", line_count, " for RAW" , f.write(line)
# Parsing of array
Twitter_ID = whole_split_line.pop(0)
Posted_At = whole_split_line.pop(0)
Screen_Name = whole_split_line.pop(0)
Tweet_Text = whole_split_line
print "START \n PARSED - ID ", Twitter_ID, " Posted Time ", Posted_At, "Twitter Name", Screen_Name, " Tweet ", Tweet_Text, "\n RAW", line, "\n END"
#WRITE TO LOG FILE
f.write(line)
# #
------- END ---------
You can get by with using a command line CSV parser like csvkit
e.g.
# get the first four columns of the followers listing as CSV
t followers ev --csv | csvcut -c 1,2,3,4
csvcut
and its other utilities handle the line-breaks and such (by using Python for the parsing).
Maybe it'd be a useful feature to have a flag in which the t
library itself strips out trailing spaces and newlines?
The same problem affects the followers command, when the user profile's text has one or more \n's. E.g. at the moment of writing, Twitter user "AlgebraWinter" is listed on two lines of output instead than one, as:
$ t followers dicoim -l --profile ~/.config/t/personal
(...)
613514515 Jun 20 2012 Aug 17 2015 8737 587 17 742 314 @AlgebraWinter Matt Hart No No Software developer at 1E, Ealing. http://t.co/1zkihd68Ve Author of 'Algebra Winter'.
Robot avatar © Julien Tromeur.
(...)
This also makes the output not machine readable, that defies the purpose of the -l
option in the first place!
There's no obvious solution, as -l
is intended to output fixed-width fields and changing its behaviour may break other people's scripts relying on that. My suggestion is that a -c
option is added, outputting valid CSV, that I believe supports \n's.
@jbabbin Btw you can use ``` to write code into your GitHub reply. Example:
print('Hello World')
agree this needs to be fixed
Parsing out my streaming timeline I've noticed that some tweets have new lines (\n) or carriage returns inside of the tweet field. Can a flag or option be placed inside 't' to remove any new lines or carriage returns or line breaks from the tweet message field itself?
Here's an example (executing as 't stream timeline -c |' )
ORIGINAL TWEET: https://twitter.com/qualys/status/517738519104454656
I'm splitting on the comma's for fields and when I get a field length of 1 I know that the previous line is likely to be a count of 4 and was split on a line break or new line from the streaming data.
ERRORLINE DEBUG Size [5] DATA [517738519104454656,2014-10-02 18:10:56 +0000,qualys,"SANS @ RISK week 39, 2014: Consensus Security Vuln Alert w/ analysis of latest vulns & remediation advice https://t.co/tT3n25DIEb"] RAWDATA [517738519104454656,2014-10-02 18:10:56 +0000,qualys,"SANS @ RISK week 39, 2014: Consensus Security Vuln Alert w/ analysis of latest vulns & remediation advice https://t.co/tT3n25DIEb"]
ERRORLINE DEBUG Size [1] DATA [http://t.co/LYNdHgjCEm http://t.co/GSI63qI7tc"] RAWDATA [http://t.co/LYNdHgjCEm http://t.co/GSI63qI7tc"]