tulul / tululbot

Telegram Bot for Tululness
Apache License 2.0
8 stars 6 forks source link

Remove citation from wikipedia first paragraph #28

Open fushar opened 9 years ago

fushar commented 9 years ago

If you use /leli, it will display the first paragraph of a wikipedia article. Sometimes it will contain citations, like [1], [2], etc. It would be better to remove them.

kmkurn commented 9 years ago

Does Wikipedia have an API we can use? I think the wiki result will be much better if we use the API. No need to parse the HTML too.

wazaundtechnik commented 9 years ago

We can simply retrieve the Wiki test and parse the Mediawiki syntax to get only the text (without formatting, style, citation, etc.)