semifor / net-twitter-lite

A lighter weight (non-Moose) Perl interface to the Twitter API
Other
26 stars 11 forks source link

false wide character appearing in feed? #17

Closed davepilbeam closed 10 years ago

davepilbeam commented 10 years ago

One of my sites is pulling in a link that turns partially into a wide character that I can't do anything about, although it will Dump correctly:

use strict;
use warnings;
use CGI::Carp qw(fatalsToBrowser);
use Data::Dumper;
use Net::Twitter::Lite::WithAPIv1_1;

my $statusref;
my $nt = Net::Twitter::Lite::WithAPIv1_1 -> new( ssl => 1,consumer_key => $twitterkey,consumer_secret => $twittersecret,access_token => $authtoken,access_token_secret => $authsecret );

eval { $statusref = $nt->user_timeline({ screen_name => $twittername,count => 3,exclude_replies => 'true'}); };

1/ for my $status( @{$statusref} ){ print $statusref->{'text'}; } will print munged: Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az…

2/ for my $status( @{$statusref} ){ print Dumper( $statusref->{'text'} ); } will print munged: Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az\x{2026}

3/ however print Dumper($statusref) ; will print correctly:

{'text' => 'Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4azs3r0Qw #families2014'}

how can I get $statusref to print correctly, as it does in Dumper, in example 3? binmode STDOUT,'utf8'; only removes the warning and no decoding/encoding I can come up with seems to work..

The unruly feed is from Feb 18 at https://twitter.com/Barrachd

Thanks, Dave

semifor commented 10 years ago

It works just fine for me. When I print it, I get:

RT @PublicSectorCo: Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az

I suspect that means your term does not support UTF8.

Try this:

$ perl -e 'binmode STDOUT, ":utf8"; print "http://t.co/H4az\x{2026}\n"'http://t.co/H4az

Does your output match? I.e., is the final character in the URL an ellipsis?

-Marc

On Tue, Feb 25, 2014 at 3:33 AM, Dave Pilbeam notifications@github.comwrote:

One of my sites is pulling in a link that turns partially into a wide character that I can't do anything about, although it will Dump correctly:

use strict; use warnings; use CGI::Carp qw(fatalsToBrowser); use Data::Dumper; use Net::Twitter::Lite::WithAPIv1_1;

my $statusref; my $nt = Net::Twitter::Lite::WithAPIv1_1 -> new( ssl => 1,consumer_key => $twitterkey,consumer_secret => $twittersecret,access_token => $authtoken,access_token_secret => $authsecret );

eval { $statusref = $nt->user_timeline({ screen_name => $twittername,count => 3,exclude_replies => 'true'}); };

1/ for my $status( @{$statusref} ){ print $statusref->{'text'}; } will print munged: Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az…

2/ for my $status( @{$statusref} ){ print Dumper( $statusref->{'text'} ); } will print munged: Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az\x{2026}

3/ however print Dumper($statusref) ; will print correctly:

{'text' => 'Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4azs3r0Qw#families2014'}

how can I get $statusref to print correctly, as it does in Dumper, in example 3? binmode STDOUT,'utf8'; only removes the warning and no decoding/encoding I can come up with seems to work..

The unruly feed is from Feb 18 at https://twitter.com/Barrachd

Thanks, Dave

— Reply to this email directly or view it on GitHubhttps://github.com/semifor/net-twitter-lite/issues/17 .

davepilbeam commented 10 years ago

No, I think it is printing wrong as well for you - the final character should not be 'ellipsis', '\x{2026}' or 'z…', it should be 's3r0Qw #families2014' and complete the url link and hashtag.

The link currently does not work because the end characters in ($nt->user_timeline) 'http://t.co/H4azs3r0Qw #families2014' have been turned into 'http://t.co/H4az\x{2026}'

Dumper($nt->user_timeline) shows that the characters ARE there initially:

Dumper($nt->user_timeline)
#produces
{'text' => 'Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4azs3r0Qw #families2014' } 
#link url is preserved

but any other output (ie print or JSON output) results in the end characters REPLACED with the ellipsis wide character and the link url destroyed:

for my $status( @{$nt->user_timeline} ){ print $status->{'text'}; }
#produces
{'text' => 'Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az…' } 
#what happened to 's3r0Qw #families2014'?

I need the link url to be preserved - it's almost like there is a character count being applied somewhere?

perl v5.014002 Net::Twitter::Lite::WithAPIv1_1;

semifor commented 10 years ago

It is printing correctly for me. You’re just mistaken about the content of the status Twitter returns.

The status in question has ID 436116902038212608. Below is some output from a debugger session examining various aspects o the status.

Not that this status is a retweet. The original tweet is embedded. It’s text includes the full t.co URL with no ellipsis (see the line beginning DB<14>). However, to display the status in 140 characters and make room for the “RT @PublicSectorCo:” prefix, Twitter truncated the text in the midst of the t.co URL and appended an ellipsis (see the line beginning DB<13>.

In Twitter’s web UI, you’ll see that accessing @Barrachd’s retweet:

http://twitter.com/Barrachd/status/436116902038212608

redirects to the original tweet:

https://twitter.com/PublicSectorCo/status/435757672927744000

You’ll notice the URL is displayed much differently. It uses the display_urlfrom the {entitiens}{urls} array in the status returned by the Twitter API. And if you hover over the link, you’ll nee it references the t.co url in {entities}{urls}[0]{url}. See the section below beginning DB<15>.

So, I think you problem is two-fold. First, your term isn’t UTF8 compatible, so when you print text with a unicode ellipsis (U+2026), you’re getting unexpected output. Second, you’re printing the truncated text of the retweet rather than the full text of the original tweet.

Hope this helps.

DB<12> $r = $nt->show_status(436116902038212608)

DB<13> p $r->{text} RT @PublicSectorCo: Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az

DB<14> p $r->{retweeted_status}{text} Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4azs3r0Qw

families2014

DB<15> x $r->{entities}{urls} 0 ARRAY(0x7fdab3cab840) 0 HASH(0x7fdab1093190) 'display_url' => '\x{2026}ubledfamilies.publicsectorconnect.org' 'expanded_url' => 'http://troubledfamilies.publicsectorconnect.org/' 'indices' => ARRAY(0x7fdab1092920) 0 139 1 140 'url' => 'http://t.co/H4azs3r0Qw'

On Wed, Feb 26, 2014 at 2:07 AM, Dave Pilbeam notifications@github.comwrote:

No, I think it is printing wrong as well for you - the final character should not be 'ellipsis', '\x{2026}' or 'z…', it should be 's3r0Qw

families2014' and complete the url link and hashtag.

The link currently does not work because the end characters in ($nt->user_timeline) 'http://t.co/H4azs3r0Qw #families2014' have been turned into 'http://t.co/H4az\x{2026}'

Dumper($nt->user_timeline) shows that the characters ARE there initially:

Dumper($nt->user_timeline)

produces

{'text' => 'Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4azs3r0Qw #families2014' }

link url is preserved

but any other output (ie print or JSON output) results in the end characters REPLACED with the ellipsis wide character and the link url destroyed:

for my $status( @{$nt->user_timeline} ){ print $status->{'text'}; }

produces

{'text' => 'Delighted to announce that we have newly confirmed speakers at the Troubled Families conference ...... http://t.co/H4az…' }

what happened to 's3r0Qw #families2014'?

I need the link url to be preserved - it's almost like there is a character count being applied somewhere?

perl v5.014002 Net::Twitter::Lite::WithAPIv1_1;

— Reply to this email directly or view it on GitHubhttps://github.com/semifor/net-twitter-lite/issues/17#issuecomment-36109737 .

davepilbeam commented 10 years ago

I understand now: I am looping through $r->{text} - which is fine unless it is a retweet and there is truncation, then I need to check $r->{retweeted_status}{text} for the full version.

I did not understand that adding 'RT @PublicSectorCo:' would truncate the $r->{text} version, neither could I visualise the complex data structure properly! Thank you for your help and patience. Dave