searope / jwpl

Automatically exported from code.google.com/p/jwpl
0 stars 0 forks source link

Section parts in links are not properly handled. #31

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,
I'm not sure whether this is intended, or a bug but when you access
all links in a page with for Example:
Map<String, Set<String>> anchorStrings = page.getOutlinkAnchors();
and iterate over them with:
for (String targetArticle : anchorStrings.keySet())

the target-article is sometimes not found by
Wikipedia.existPage(String title) or Wikipedia.getPage(String title)
because it contains a section link.
for example "Geschichte_Norwegens#Der Bürgerkrieg" (german wikipidea)
is returned as targetArticle and the article "Geschichte_Norwegens"
exists but Wikipedia.existPage returns false and Wikipedia.getPage
throws an exception.

I think in general it is valid that the method returns the full target
string of a link, because maybe some applications need that extra
section information, but the javadoc of the method
Page.getOutlinkAnchors() should warn you about this behavior, because
if you overlook it you might discard a lot of valid links.

Also it would be a nice feature if the Wikipedia.existPage() and
Wikipedia.getPage() methods would resolve section links automatically.
(return true or the Page, if the string before the "#" is a valid
title).

Original issue reported on code.google.com by torsten....@gmail.com on 4 Jul 2011 at 12:08

GoogleCodeExporter commented 9 years ago
Thanks for reporting.
I have adapted existPage(String) and getPage(String) method to correctly handle 
this case.
Also the Title object now allows to get the section part.

Original comment by torsten....@gmail.com on 4 Jul 2011 at 12:10

GoogleCodeExporter commented 9 years ago

Original comment by oliver.ferschke on 16 Feb 2012 at 1:24