lovellliu / wikixmlj

Automatically exported from code.google.com/p/wikixmlj
0 stars 0 forks source link

getCategories() works only for english as of now #1

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
This is due to the hardcoded "Category:" string. Wikipedia for other
languages use different surface forms. This can be easily resolved but save
it for later.

Original issue reported on code.google.com by delip...@gmail.com on 11 Oct 2008 at 1:10

GoogleCodeExporter commented 9 years ago
Added this issue as a reminder for later.

Original comment by delip...@gmail.com on 11 Oct 2008 at 1:10

GoogleCodeExporter commented 9 years ago
I've come across this issue for the portuguese language. I fixed just changing 
the catPattern in WikiTextParser.java
private static Pattern catPattern = 
Pattern.compile("\\[\\[Categor[iy]a?:(.*?)\\]\\]", Pattern.MULTILINE);

Original comment by felipehu...@gmail.com on 11 Nov 2010 at 3:25

GoogleCodeExporter commented 9 years ago
Thanks but I think this would be a short-term solution that cannot be extended 
to multiple languages -- think about Arabic? I have a reasonable idea in the 
works now. I should be able to check it in before the holidays.

Original comment by delip...@gmail.com on 12 Nov 2010 at 1:47