Open aadant opened 9 years ago
Sorry for the late response. This is a problem with unsupported template expansion.
The raw wikimarkup for the text that is incorrectly translated is:
Le {{s|XVII|e}} est considéré comme l'[[âge d'or]] d'Amsterdam car elle devient à cette époque la ville la plus riche du monde
Which uses a template "s". The French edition uses templates far more frequent for common formatting than that of e.g. English and Swedish.
I have plans on implementing template expansion by using a fast disk-based hashmap, but the performance will depend on how much memory that is available for caching and you will have to do two passes over the data.
I cannot give you a timeline for when this feature will be included other than that it is on the TODO list and is considered highly important.
Thank you for your feedback. It might be a sweble issue. I will raise another issue for the missing hyphen in Andalousie
Hey Marcus, I was looking at this project : https://github.com/attardi/wikiextractor/issues/32#issuecomment-136178794
Looks like you will also need to support Modules (and Lua !). Fortunately there are Java implementations of Lua. So it can still be full java.
java -jar target/wikiforia-1.2.1.jar --pages ../frwiki-20150602-pages-articles-multistream.xml.bz2 -lang fr -o xml
interrupt after a couple of minutes since the issue is in the first pages
Example : Amsterdam, id = 245
Le est considéré comme l'âge d'or d'Amsterdam car elle devient à cette époque la ville la plus riche du monde.
should be
Le XVIIe siècle est considéré comme l'âge d'or d'Amsterdam car elle devient à cette époque la ville la plus riche du monde