Closed JamieKitson closed 11 years ago
I want to write a generic expander. I got started with https://github.com/kaihendry/Greptweet/blob/master/expand-urls.sh but it needs to know what URLs it has already expanded. Ideas?
Text file, space delimited?
However, I don't think that's really relevant to this issue, the picture URL should be picked out of the XML, it will surely be quicker.
On 16 July 2012 11:40, Jamie Kitson reply@reply.github.com wrote:
Text file, space delimited?
Not sure what you mean by that. Lacking context?
However, I don't think that's really relevant to this issue, the picture URL should be picked out of the XML, it will surely be quicker.
You're right, though how do we keep track of tweets we've already expanded?
https://github.com/kaihendry/Greptweet/blob/master/expand-urls.sh
It's a pity that xmlstarlet can't seem to apply templates multiple times, but as it is I think we have to add:
-m "entities/media/creative" -i "expanded_url != ''" -n -o "url " -v "url" -o " " -v "expanded_url" -b -b
So with #14 the whole thing becomes:
xmlstarlet sel -t -m "statuses/status" -m ".|retweeted_status" -i "(name() = 'status' and not(retweeted_status)) or name() = 'retweeted_status'" -n -o "text " -v "id" -o "|" -v "created_at" -o "|" -i "name() = 'retweeted_status'" -o "RT @" -v "user/screen_name" -o ": " -b -v "normalize-space(text)" -m "entities/urls/url" -i "expanded_url != ''" -n -o "url " -v "url" -o " " -v "expanded_url" -b -b -m "entities/media/creative" -i "expanded_url != ''" -n -o "url " -v "url" -o " " -v "expanded_url" -b -b
Be better if this was done as a pull request or don't you have commit rights?
Copying and pasting out a text area = PITA
Not sure how to test this is working. Can you give me an example tweet?
I don't think github understands commas. This tweet of mine shows both a truncated retweet and a shortened image url:
Should test this with multiple images in one tweet.
Closed with eaa879d
Picture URLs are embedded further into the URL, instread of:
statuses/status/entities/urls/url
They're in:
statuses/status/entities/media/creative
See for example:
http://api.twitter.com/1/statuses/user_timeline.xml?screen_name=jamiekitson&count=1&include_rts=1&include_entities=1&max_id=214784069941207040
This example also suffers from the retweeted URL issue, so the urls are in:
statuses/status/retweeted_status/entities/media/creative
This could get complicated!