kaihendry / greptweet

Sane twitter backup and search
https://greptweet.com/
Other
76 stars 10 forks source link

Picture URLs don't get expanded #15

Closed JamieKitson closed 11 years ago

JamieKitson commented 12 years ago

Picture URLs are embedded further into the URL, instread of:

statuses/status/entities/urls/url

They're in:

statuses/status/entities/media/creative

See for example:

http://api.twitter.com/1/statuses/user_timeline.xml?screen_name=jamiekitson&count=1&include_rts=1&include_entities=1&max_id=214784069941207040

This example also suffers from the retweeted URL issue, so the urls are in:

statuses/status/retweeted_status/entities/media/creative

This could get complicated!

kaihendry commented 11 years ago

I want to write a generic expander. I got started with https://github.com/kaihendry/Greptweet/blob/master/expand-urls.sh but it needs to know what URLs it has already expanded. Ideas?

JamieKitson commented 11 years ago

Text file, space delimited?

However, I don't think that's really relevant to this issue, the picture URL should be picked out of the XML, it will surely be quicker.

kaihendry commented 11 years ago

On 16 July 2012 11:40, Jamie Kitson reply@reply.github.com wrote:

Text file, space delimited?

Not sure what you mean by that. Lacking context?

However, I don't think that's really relevant to this issue, the picture URL should be picked out of the XML, it will surely be quicker.

You're right, though how do we keep track of tweets we've already expanded?

https://github.com/kaihendry/Greptweet/blob/master/expand-urls.sh

JamieKitson commented 11 years ago

It's a pity that xmlstarlet can't seem to apply templates multiple times, but as it is I think we have to add:

-m "entities/media/creative" -i "expanded_url != ''" -n -o "url " -v "url" -o " " -v "expanded_url" -b -b

So with #14 the whole thing becomes:

xmlstarlet sel -t -m "statuses/status" -m ".|retweeted_status" -i "(name() = 'status' and not(retweeted_status)) or name() = 'retweeted_status'" -n -o "text " -v "id" -o "|" -v "created_at" -o "|" -i "name() = 'retweeted_status'" -o "RT @" -v "user/screen_name" -o ": " -b -v "normalize-space(text)" -m "entities/urls/url" -i "expanded_url != ''" -n -o "url " -v "url" -o " " -v "expanded_url" -b -b -m "entities/media/creative" -i "expanded_url != ''" -n -o "url " -v "url" -o " " -v "expanded_url" -b -b
kaihendry commented 11 years ago

Be better if this was done as a pull request or don't you have commit rights?

Copying and pasting out a text area = PITA

kaihendry commented 11 years ago

Not sure how to test this is working. Can you give me an example tweet?

JamieKitson commented 11 years ago

I don't think github understands commas. This tweet of mine shows both a truncated retweet and a shortened image url:

Sat Oct 27 15:56:42 +0000 2012 262221290075734016
JamieKitson commented 11 years ago

Should test this with multiple images in one tweet.

JamieKitson commented 11 years ago

Closed with eaa879d