Closed gruffaren closed 1 year ago
When I check the current status:
And I can see that in the API response is indeed missing space. :/
It's also missing that space in plain
format - https://en.wikipedia.org/w/api.php?action=query&explaintext=1&exsectionformat=plain&prop=extracts&titles=Planet& - or raw
format - https://en.wikipedia.org/w/api.php?action=query&explaintext=1&exsectionformat=raw&prop=extracts&titles=Planet&
It does not look to me, that it can be somehow resolved when I check documentation for the API - https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bextracts
Since it does not look fixable, I am closing this issue. If you figure out, how to bypass the problem, please, feel free to reopen this issue.
The .summary attribute of a page does not include a newline or space after a sentence that ends in hard brackets [ ] on the Wikipedia page.
Example:
wiki = wiki_api.Wikipedia(language="en")
query = "planet"
page = wiki.page(query)
text = page.summary
print(text[:400])
which queries the article: https://en.wikipedia.org/wiki/Planet and returns:
A planet is an astronomical body orbiting a star or stellar remnant that is massive enough to be rounded by its own gravity, is not massive enough to cause thermonuclear fusion, and – according to the International Astronomical Union but not all planetary scientists – has cleared its neighbouring region of planetesimals.The term planet is ancient, with ties to history, astrology, science, mytholog
Observe the lack of space between
planetesimals.
andThe
at the first paragraph, which ends with "planetesimals.[b][1][2]" on the web-page. Whilst later in the summary, atprint(text[1200:1500])
There is a space between "discovered)." and "Ptolemy" as expected:the scientific community are no longer viewed as such under the current definition. Some of the excluded objects include Ceres, Pallas, Juno, Vesta (all of which are objects in the solar asteroid belt), and Pluto (the first trans-Neptunian object discovered). Ptolemy thought that the planets orbite
Please let me know if any additional information is needed to fix this, or if there is a workaround.