Closed davepilbeam closed 10 years ago
It works just fine for me. When I print it, I get:
RT @PublicSectorCo: Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az…
I suspect that means your term does not support UTF8.
Try this:
$ perl -e 'binmode STDOUT, ":utf8"; print "http://t.co/H4az\x{2026}\n"'http://t.co/H4az…
Does your output match? I.e., is the final character in the URL an ellipsis?
-Marc
On Tue, Feb 25, 2014 at 3:33 AM, Dave Pilbeam notifications@github.comwrote:
One of my sites is pulling in a link that turns partially into a wide character that I can't do anything about, although it will Dump correctly:
use strict; use warnings; use CGI::Carp qw(fatalsToBrowser); use Data::Dumper; use Net::Twitter::Lite::WithAPIv1_1;
my $statusref; my $nt = Net::Twitter::Lite::WithAPIv1_1 -> new( ssl => 1,consumer_key => $twitterkey,consumer_secret => $twittersecret,access_token => $authtoken,access_token_secret => $authsecret );
eval { $statusref = $nt->user_timeline({ screen_name => $twittername,count => 3,exclude_replies => 'true'}); };
1/ for my $status( @{$statusref} ){ print $statusref->{'text'}; } will print munged: Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az…
2/ for my $status( @{$statusref} ){ print Dumper( $statusref->{'text'} ); } will print munged: Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az\x{2026}
3/ however print Dumper($statusref) ; will print correctly:
{'text' => 'Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4azs3r0Qw#families2014'}
how can I get $statusref to print correctly, as it does in Dumper, in example 3? binmode STDOUT,'utf8'; only removes the warning and no decoding/encoding I can come up with seems to work..
The unruly feed is from Feb 18 at https://twitter.com/Barrachd
Thanks, Dave
— Reply to this email directly or view it on GitHubhttps://github.com/semifor/net-twitter-lite/issues/17 .
No, I think it is printing wrong as well for you - the final character should not be 'ellipsis', '\x{2026}' or 'z…', it should be 's3r0Qw #families2014' and complete the url link and hashtag.
The link currently does not work because the end characters in ($nt->user_timeline) 'http://t.co/H4azs3r0Qw #families2014' have been turned into 'http://t.co/H4az\x{2026}'
Dumper($nt->user_timeline) shows that the characters ARE there initially:
Dumper($nt->user_timeline)
#produces
{'text' => 'Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4azs3r0Qw #families2014' }
#link url is preserved
but any other output (ie print or JSON output) results in the end characters REPLACED with the ellipsis wide character and the link url destroyed:
for my $status( @{$nt->user_timeline} ){ print $status->{'text'}; }
#produces
{'text' => 'Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az…' }
#what happened to 's3r0Qw #families2014'?
I need the link url to be preserved - it's almost like there is a character count being applied somewhere?
perl v5.014002 Net::Twitter::Lite::WithAPIv1_1;
It is printing correctly for me. You’re just mistaken about the content of the status Twitter returns.
The status in question has ID 436116902038212608. Below is some output from a debugger session examining various aspects o the status.
Not that this status is a retweet. The original tweet is embedded. It’s text includes the full t.co URL with no ellipsis (see the line beginning DB<14>). However, to display the status in 140 characters and make room for the “RT @PublicSectorCo:” prefix, Twitter truncated the text in the midst of the t.co URL and appended an ellipsis (see the line beginning DB<13>.
In Twitter’s web UI, you’ll see that accessing @Barrachd’s retweet:
http://twitter.com/Barrachd/status/436116902038212608
redirects to the original tweet:
https://twitter.com/PublicSectorCo/status/435757672927744000
You’ll notice the URL is displayed much differently. It uses the display_urlfrom the {entitiens}{urls} array in the status returned by the Twitter API. And if you hover over the link, you’ll nee it references the t.co url in {entities}{urls}[0]{url}. See the section below beginning DB<15>.
So, I think you problem is two-fold. First, your term isn’t UTF8 compatible, so when you print text with a unicode ellipsis (U+2026), you’re getting unexpected output. Second, you’re printing the truncated text of the retweet rather than the full text of the original tweet.
Hope this helps.
DB<12> $r = $nt->show_status(436116902038212608)
DB<13> p $r->{text} RT @PublicSectorCo: Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az…
DB<14> p $r->{retweeted_status}{text} Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4azs3r0Qw
DB<15> x $r->{entities}{urls} 0 ARRAY(0x7fdab3cab840) 0 HASH(0x7fdab1093190) 'display_url' => '\x{2026}ubledfamilies.publicsectorconnect.org' 'expanded_url' => 'http://troubledfamilies.publicsectorconnect.org/' 'indices' => ARRAY(0x7fdab1092920) 0 139 1 140 'url' => 'http://t.co/H4azs3r0Qw'
On Wed, Feb 26, 2014 at 2:07 AM, Dave Pilbeam notifications@github.comwrote:
No, I think it is printing wrong as well for you - the final character should not be 'ellipsis', '\x{2026}' or 'z…', it should be 's3r0Qw
families2014' and complete the url link and hashtag.
The link currently does not work because the end characters in ($nt->user_timeline) 'http://t.co/H4azs3r0Qw #families2014' have been turned into 'http://t.co/H4az\x{2026}'
Dumper($nt->user_timeline) shows that the characters ARE there initially:
Dumper($nt->user_timeline)
produces
{'text' => 'Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4azs3r0Qw #families2014' }
link url is preserved
but any other output (ie print or JSON output) results in the end characters REPLACED with the ellipsis wide character and the link url destroyed:
for my $status( @{$nt->user_timeline} ){ print $status->{'text'}; }
produces
{'text' => 'Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az…' }
what happened to 's3r0Qw #families2014'?
I need the link url to be preserved - it's almost like there is a character count being applied somewhere?
perl v5.014002 Net::Twitter::Lite::WithAPIv1_1;
— Reply to this email directly or view it on GitHubhttps://github.com/semifor/net-twitter-lite/issues/17#issuecomment-36109737 .
I understand now: I am looping through $r->{text} - which is fine unless it is a retweet and there is truncation, then I need to check $r->{retweeted_status}{text} for the full version.
I did not understand that adding 'RT @PublicSectorCo:' would truncate the $r->{text} version, neither could I visualise the complex data structure properly! Thank you for your help and patience. Dave
One of my sites is pulling in a link that turns partially into a wide character that I can't do anything about, although it will Dump correctly:
1/
for my $status( @{$statusref} ){ print $statusref->{'text'}; }
will print munged: Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az…2/
for my $status( @{$statusref} ){ print Dumper( $statusref->{'text'} ); }
will print munged: Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az\x{2026}3/ however
print Dumper($statusref) ;
will print correctly:{'text' => 'Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4azs3r0Qw #families2014'}
how can I get $statusref to print correctly, as it does in Dumper, in example 3? binmode STDOUT,'utf8'; only removes the warning and no decoding/encoding I can come up with seems to work..
The unruly feed is from Feb 18 at https://twitter.com/Barrachd
Thanks, Dave