sweble / sweble-wikitext

The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaWiki.
http://sweble.org/sites/swc-devel/develop-latest/tooling/sweble/sweble-wikitext
70 stars 27 forks source link

Conflicts within language configurator #72

Open tgalery opened 6 years ago

tgalery commented 6 years ago

When using the language configurator to generate a config, we usually get a stacktrace when trying to associate an alias which already has been set, Looking at at the magic words config that generates the wikiconfig https://ja.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=magicwords&format=xml, the bug can be described as follows:

  1. alias 名前空間 is registered for the NAMESPACE id
  2. alias 名前空間 appears again for the ns id (things are slightly more complicated because for the second is appears followed by a :, but there's some suffix modifying code that adds the colon under certain circumstances)
  3. Since we throw an exception here when detecting the type in conflict in (2) and the 名前空間 magic word is the first alias of the ns id, the whole id and associated aliases are not added to the maps.
  4. When adding parsing functions, we explicitly expect one capable of parsing one for ns. Since there's no one that can be found in the maps, an exception is thrown.

After PR #73, we don't throw an exception when we find an alias already registered to an id. This means that the first ambiguous alias is set to the id that is first associated with, but ignored for others. Thus, a WikiConfigobject can be created, but it is not an optimal solution. if the is the possibility of two aliases being associated to the same id, we need the code to be able to handle that well.

mawiesne commented 6 years ago

Referring to comment in https://github.com/dkpro/dkpro-jwpl/issues/159:

@tgalery Sure, it merely "prints the stacktrace" but it's kind of confusing, e.g for one of my students. Proposal:

Cheers mawiesne

tgalery commented 6 years ago

To my mind a warning should suffice. I'm just wondering if there would be a way to allow a many to one relationship so these cases could be covered as well.

tgalery commented 6 years ago

So digging more into this issue. Confusing stacktraces aside, it seems that for some languages we are able to create a wikiConfig instance.

import org.sweble.wikitext.engine.utils.LanguageConfigGenerator
val deConfig = LanguageConfigGenerator.generateWikiConfig("de")
deConfig: org.sweble.wikitext.engine.config.WikiConfig = org.sweble.wikitext.engine.config.WikiConfigImpl@e3643ba4

This prints a bunch of stacktraces like the one above, but if we do the same thing for "ja" pretty much the same stacktraces are produced but no value is created for the config, which is kind of a problem.

tgalery commented 6 years ago

@hannesd I've been trying to build this locally, but I'm getting some issues with some of the deps of the project:

[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]
[ERROR]   The project org.sweble:sweble-parent:3.1.7-SNAPSHOT (/Users/thiago/code/tgalery/sweble-wikitext/pom.xml) has 1 error
[ERROR]     Non-resolvable parent POM for org.sweble:sweble-parent:3.1.7-SNAPSHOT: Could not transfer artifact de.fau.cs.osr:tooling:pom:3.0.9-SNAPSHOT from/to osr-public-repo (http://mojo-maven.cs.fau.de/content/repositories/public): sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: timestamp check failed and 'parent.relativePath' points at wrong local POM @ line 23, column 10: NotAfter: Wed Jul 25 13:29:37 BST 2018 -> [Help 2]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
[ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException

Any pointers ?

hannesd commented 6 years ago

I'm sorry but we screwed up and now our certificates have expired. If you turn of certificate validation it should work again but this is of course not ideal: https://stackoverflow.com/questions/21252800/how-to-tell-maven-to-disregard-ssl-errors-and-trusting-all-certs

We'll try to renew our certificates as soon as possible.

hannesd commented 6 years ago

Concerning the actual problem with this issue: I'm quite busy at the moment and will not have the time to work on this. If you could provide a pull request I'd be more than happy to accept it. Simply turning the exception into a warning is fine with me, if this solves your problem.

hannesd commented 6 years ago

Certificates have been fixed. Sorry for the inconvenience :(

tgalery commented 6 years ago

Many thanks for this, will try to come up with a PR as soon as I have the time.

On Fri, Jul 27, 2018 at 9:13 AM, Hannes Dohrn notifications@github.com wrote:

Certificates have been fixed. Sorry for the inconvenience :(

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sweble/sweble-wikitext/issues/72#issuecomment-408346389, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwssPyE6_3DdTXrU3yp569ckKg5mCMkks5uKsu_gaJpZM4UzXOd .

tgalery commented 6 years ago

Sorry @hannesd I've actually tried building using mvn package install and I still get the errors. I've tried to use the mvn cli opts in the stack overflow link you mentioned but also without sucess. I'm wondering whether there' s a problem deeper then certificates updates. Could you try building this locally ?

tgalery commented 6 years ago

fyi, I'm trying to build the dev branch, which I think is the base.

hannesd commented 6 years ago

Sorry again, the certificate chain had changed and I didn't notice. Should work now...

mawiesne commented 6 years ago

@hannesd

Simply turning the exception into a warning is fine with me, if this solves your problem.

^ This will at least solve spammed consoles which irritate most devs most of the time 👍. @tgalery Could you check LanguageConfigGenerator.java:209 and fix this odd behaviour?

Yet, this is merely an optical cure of the underlying problem, I guess.

sweble commented 6 years ago

I'm currently on vacation. I'll come back to you in two weeks.

Cheers, Hannes

On Fri, Aug 3, 2018, 14:41 Martin Wiesner notifications@github.com wrote:

@hannesd https://github.com/hannesd

Simply turning the exception into a warning is fine with me, if this solves your problem.

^ This at least solve spammed consoles which irritate most devs most of the time 👍. @tgalery https://github.com/tgalery Can you check LanguageConfigGenerator.java:209 and fix this odd behaviour?

Yet, this is merely an optical cure of the underlying problem, I guess.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sweble/sweble-wikitext/issues/72#issuecomment-410241324, or mute the thread https://github.com/notifications/unsubscribe-auth/AAvZoWsmUDD_g9Q4LS9WEv9jXzDjMUlfks5uNETegaJpZM4UzXOd .

tgalery commented 6 years ago

@mawiesne just back from a trip today, will have a look today

tgalery commented 6 years ago

ok, so dig some digging and kind of understand what's going on. Here is some debug on a branch of mine. From the scala console, we get something like this:

scala> val lang = "ja"
scala> val config = LanguageConfigGenerator.generateWikiConfig(lang)
Got: The name `名前空間:' was already registered by the alias `namespace' when trying to register it for alias `ns'. when adding alias ns
java.lang.IllegalArgumentException: No alias registered for parser function `ns'.
  at org.sweble.wikitext.engine.config.WikiConfigImpl.addParserFunction(WikiConfigImpl.java:449)

So in the LanguageConfigurator.java, generateWikiConfig first tries to run addi18NAliases which fails to register the ns namespace due to the fact that some other japanese magical word is registered with the keyword namespace, but then addParserFunctions expects that something is associated with the keyword ns and since no exception is handled, a language conf is not created. One option is to associate multiple keywords with an alias, e.g. namespace to be associated with [ns, 名前空間]. Does anyone see a problem with that ?

tgalery commented 6 years ago

Would be nice to have thoughts on the PR above ^