lum-ai / odinson

Odinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.
https://lum.ai/odinson/docs/
Apache License 2.0
65 stars 23 forks source link

How to index and query custom token fields? #331

Closed victoryhb closed 3 years ago

victoryhb commented 3 years ago

When using a custom annotation pipeline, I could index custom token fields (e.g. upos - universal POS), but querying them (e.g. [upos=VERB]) will produce a Parse Error like the following:

java.lang.Exception: Parse Error, Position 1:1, found "[upos=VERB"
    at fastparse.Parsed$Failure.get(Parsed.scala:54)
    at fastparse.Parsed$Failure.get(Parsed.scala:51)
    at ai.lum.odinson.compiler.QueryParser.parseBasicQuery(QueryParser.scala:14)
    at ai.lum.odinson.compiler.QueryCompiler.compile(QueryCompiler.scala:38)
    at ai.lum.odinson.compiler.QueryCompiler.mkQuery(QueryCompiler.scala:44)
    at controllers.OdinsonController.$anonfun$runQuery$2(OdinsonController.scala:947)
    at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
    at scala.util.Success.$anonfun$map$1(Try.scala:255)
    at scala.util.Success.map(Try.scala:213)
    at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
    at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
    at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
    at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:56)
    at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:93)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
    at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:93)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:48)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

Strangely, I discovered after trial and error that if I assign the custom field to a predefined token field (e.g. posTagTokenField = upos) in the 'index' section of application.conf, the above query will work correctly in the Shell (but still not in the REST-API interface).

I don't seem to find relevant hints in the docs. May I ask what is the proper way to index and query custom token fields?

BeckySharp commented 3 years ago

I could be mistaken but I think if you add the field name to this list: https://github.com/lum-ai/odinson/blob/master/core/src/main/resources/reference.conf#L47

it will work, and you can do that in this config or an application.conf for your project

thanks for pointing it out that that’s not in the docs!

victoryhb commented 3 years ago

I added the upos field but still no luck :(

  compiler {
    # fields available per token
    allTokenFields = [
      ${odinson.index.rawTokenField},
      ${odinson.index.lemmaTokenField},
      ${odinson.index.uposTokenField},
      ${odinson.index.xposTokenField},
      ${odinson.index.incomingTokenField},
      ${odinson.index.outgoingTokenField},
    ]
}

odinson.index {
    rawTokenField = raw
    displayField = raw

    lemmaTokenField = lemma
    uposTokenField = upos
    xposTokenField = xpos
}

The same Parse Error occurred, in both the Shell and the backend interface.

BeckySharp commented 3 years ago

FWIW, I did custom fields in a different project for a really different/weird use case. In that project, I have an application.conf and I'm using Odinson core as a library dep. Here's my application config and I am able to write patterns against the custom fields:

https://github.com/BeckySharp/nsf_rules/blob/master/user_study/rulemaking/src/main/resources/application.conf

Here's where I build the patterns against my custom fields (node and color): https://github.com/BeckySharp/nsf_rules/blob/master/user_study/rulemaking/src/main/scala/org/clulab/rulemaking/Traversal.scala#L101-L109

Did you create your index with those fields in place in the conf?

victoryhb commented 3 years ago

Thanks a lot for providing your real-life examples. I adapted your config to my fields:

odinson.compiler.allTokenFields = ["raw", "lemma", "upos", "xpos"]
odinson.index.storedFields = ["raw", "lemma", "upos", "xpos"]
odinson.index.addToNormalizedField = ["raw"]

And it worked perfectly for the Shell. Unfortuately, the same Parse Error still occurs in the backend API interface. Is it because the backend uses a different config from the Shell?

BeckySharp commented 3 years ago

I am not super familiar with the backend API code, sorry! @danebell @myedibleenso do you know?

myedibleenso commented 3 years ago

Hi, @victoryhb . Can you please do the following ...

  1. Create a gist with the output of /api/config?pretty=true. This will give us a clear picture of what is being included in the config used by the REST API
  2. Provide a link to the contents of the modified config file in the backend subproject using a gist
  3. Provide the steps you followed to launch the REST API after modifying the config
victoryhb commented 3 years ago

@myedibleenso Thanks. I followed your instructions and created the gist with the two files. For launching the REST API, I just use sbt "backend/run" in Odinson's root folder. For the shell I use sbt "extra/runMain ai.lum.odinson.extra.Shell" From the api config, I can see that the dataDir and indexDir etc. are not the same as what I defined in application.conf. It appears that by default the backend module is not loading the correct config. The problem then becomes how can I get it to do so?

victoryhb commented 3 years ago

Ah, I just realized that it was silly of me to assume that the backend is using the same application.conf file as the Shell, while in fact the former should be configured using backend/conf/application.conf instead of extra/src/main/resources/application.conf. I didn't even know of the former config's existence until now (since only the latter was mentioned in the docs) After modifying the config for the backend, everything works perfectly! Sorry to have bothered you and thanks a ton for your patient help! @BeckySharp @myedibleenso

myedibleenso commented 3 years ago

Glad to hear you got it working, @victoryhb !

No need to apologize. We welcome the questions. As you said, when and where things need to be re-configured isn't obvious from our documentation. If you have the time, we'd certainly welcome a PR that improves the documentation related to using and modifying the REST API:

Regarding merging configuration files, you may find the features described here useful to you.

victoryhb commented 3 years ago

@myedibleenso Thanks for the pointers. Sure, I'd love to help with enriching the docs once I am more familar with the system so other users might benefit. I will submit a PR later.