Mongo.formatter.py: The presence of an "m-dash" (Unicode E2) in the first 32 or characters of a story (particularly if the initial word is all-caps) usually signals a dateline, e.g.
"TAIPEI, Taiwan — "
and the text from the beginning of the story to the m-dash location + 1 could be eliminated. However, I haven't quite figured out the correct incantations to keep Python happy with such a check, though u"\xe2" is probably the way to designate the character.
Mongo.formatter.py: The presence of an "m-dash" (Unicode E2) in the first 32 or characters of a story (particularly if the initial word is all-caps) usually signals a dateline, e.g.
"TAIPEI, Taiwan — "
and the text from the beginning of the story to the m-dash location + 1 could be eliminated. However, I haven't quite figured out the correct incantations to keep Python happy with such a check, though u"\xe2" is probably the way to designate the character.