korpling / ANNIS

ANNIS is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation.
http://corpus-tools.org/annis/
Apache License 2.0
69 stars 25 forks source link

Discontinuous span import fixing doesn't work #574

Closed sdruskat closed 4 years ago

sdruskat commented 6 years ago

What is the used ANNIS version? 3.5.0-preview5 (rev. 20ffc21416, built 2017-10-09 11:48:10)

What browser and operating system did you use? Ubuntu 16.04 LTS Firefox 57.0.1

What steps will reproduce the problem?

  1. Import the attached test corpus (exported via ANNISExporter from Salt Project, also attached)
  2. Search for test
  3. Open the grid visualizer

What is the expected result? Grid Visualizer shows discont span covering tokens 3, 7, 8 ("example", "it", "appears").

What happens instead? Grid Visualizer shows continuous span covering tokens 3, 4, 5, 6, 7, 8 ("example", "more", "complicated", "than", "it", "appears").

According to @thomaskrause, this might be to do with the link fixer not being called during import into ANNIS.

Session log (start annis, import corpus with overwrite, open web app, search, display grid):

INFO [SwingWorker-pool-1-thread-1] 2018-01-02 11:12:29,019  a.s.i.AnnisServiceRunner - Starting up REST...
INFO [SwingWorker-pool-1-thread-1] 2018-01-02 11:12:29,380  a.a.SchemeFixer - testing if fixing schema is necessary
INFO [SwingWorker-pool-1-thread-1] 2018-01-02 11:12:29,395  a.a.SchemeFixer - finished schema test
INFO [SwingWorker-pool-1-thread-1] 2018-01-02 11:12:29,536  a.s.i.QueryServiceImpl - ANNIS QueryService loaded.
WARN [SwingWorker-pool-1-thread-1] 2018-01-02 11:12:29,596  a.s.i.AnnisServiceRunner - *NOT* using authentification, your ANNIS service *IS NOT SECURED*
INFO [SwingWorker-pool-1-thread-1] 2018-01-02 11:12:29,605  o.e.j.s.Server - jetty-8.1.18.v20150929
INFO [SwingWorker-pool-1-thread-1] 2018-01-02 11:12:29,623  o.e.j.s.h.ContextHandler$Context - Initializing Shiro environment
INFO [SwingWorker-pool-1-thread-1] 2018-01-02 11:12:30,606  o.e.j.s.AbstractConnector - Started SelectChannelConnector@localhost:5711
INFO [SwingWorker-pool-1-thread-1] 2018-01-02 11:12:30,618  o.e.j.s.Server - jetty-8.1.18.v20150929
INFO [SwingWorker-pool-1-thread-1] 2018-01-02 11:12:31,573  o.e.j.s.AbstractConnector - Started SelectChannelConnector@0.0.0.0:8080
INFO [qtp665146680-36] 2018-01-02 11:12:33,903  c.v.e.o.s.i.JDK14LoggerAdapter - Installed AtmosphereHandler com.vaadin.server.communication.PushAtmosphereHandler mapped to context-path: /*
INFO [qtp665146680-36] 2018-01-02 11:12:33,908  c.v.e.o.s.i.JDK14LoggerAdapter - Installed the following AtmosphereInterceptor mapped to AtmosphereHandler com.vaadin.server.communication.PushAtmosphereHandler
INFO [qtp665146680-36] 2018-01-02 11:12:33,915  c.v.e.o.s.i.JDK14LoggerAdapter - META-INF/services/org.atmosphere.cpr.AtmosphereFramework not found in class loader
INFO [qtp665146680-36] 2018-01-02 11:12:33,940  c.v.e.o.s.i.JDK14LoggerAdapter - Atmosphere is using org.atmosphere.util.VoidAnnotationProcessor for processing annotation
INFO [qtp665146680-36] 2018-01-02 11:12:33,957  c.v.e.o.s.i.JDK14LoggerAdapter - Installed WebSocketProtocol org.atmosphere.websocket.protocol.SimpleHttpProtocol 
INFO [qtp665146680-36] 2018-01-02 11:12:33,977  c.v.e.o.s.i.JDK14LoggerAdapter - Installing Default AtmosphereInterceptors
INFO [qtp665146680-36] 2018-01-02 11:12:33,978  c.v.e.o.s.i.JDK14LoggerAdapter -    org.atmosphere.interceptor.CorsInterceptor : CORS Interceptor Support
INFO [qtp665146680-36] 2018-01-02 11:12:33,979  c.v.e.o.s.i.JDK14LoggerAdapter -    org.atmosphere.interceptor.CacheHeadersInterceptor : Default Response's Headers Interceptor
INFO [qtp665146680-36] 2018-01-02 11:12:33,980  c.v.e.o.s.i.JDK14LoggerAdapter -    org.atmosphere.interceptor.PaddingAtmosphereInterceptor : Browser Padding Interceptor Support
INFO [qtp665146680-36] 2018-01-02 11:12:33,982  c.v.e.o.s.i.JDK14LoggerAdapter -    org.atmosphere.interceptor.AndroidAtmosphereInterceptor : Android Interceptor Support
INFO [qtp665146680-36] 2018-01-02 11:12:33,982  c.v.e.o.s.i.JDK14LoggerAdapter - Dropping Interceptor org.atmosphere.interceptor.HeartbeatInterceptor
INFO [qtp665146680-36] 2018-01-02 11:12:33,983  c.v.e.o.s.i.JDK14LoggerAdapter -    org.atmosphere.interceptor.SSEAtmosphereInterceptor : SSE Interceptor Support
INFO [qtp665146680-36] 2018-01-02 11:12:33,984  c.v.e.o.s.i.JDK14LoggerAdapter -    org.atmosphere.interceptor.JSONPAtmosphereInterceptor : JSONP Interceptor Support
INFO [qtp665146680-36] 2018-01-02 11:12:33,986  c.v.e.o.s.i.JDK14LoggerAdapter -    org.atmosphere.interceptor.JavaScriptProtocol : Atmosphere JavaScript Protocol
INFO [qtp665146680-36] 2018-01-02 11:12:33,987  c.v.e.o.s.i.JDK14LoggerAdapter -    org.atmosphere.interceptor.WebSocketMessageSuspendInterceptor : org.atmosphere.interceptor.WebSocketMessageSuspendInterceptor
INFO [qtp665146680-36] 2018-01-02 11:12:33,988  c.v.e.o.s.i.JDK14LoggerAdapter -    org.atmosphere.interceptor.OnDisconnectInterceptor : Browser disconnection detection
INFO [qtp665146680-36] 2018-01-02 11:12:33,988  c.v.e.o.s.i.JDK14LoggerAdapter -    org.atmosphere.interceptor.IdleResourceInterceptor : org.atmosphere.interceptor.IdleResourceInterceptor
INFO [qtp665146680-36] 2018-01-02 11:12:33,989  c.v.e.o.s.i.JDK14LoggerAdapter - Set org.atmosphere.cpr.AtmosphereInterceptor.disableDefaults to disable them.
INFO [qtp665146680-36] 2018-01-02 11:12:33,993  c.v.e.o.s.i.JDK14LoggerAdapter - Using EndpointMapper class org.atmosphere.util.DefaultEndpointMapper
INFO [qtp665146680-36] 2018-01-02 11:12:33,993  c.v.e.o.s.i.JDK14LoggerAdapter - Using BroadcasterCache: org.atmosphere.cache.UUIDBroadcasterCache
INFO [qtp665146680-36] 2018-01-02 11:12:33,993  c.v.e.o.s.i.JDK14LoggerAdapter - Default Broadcaster Class: org.atmosphere.cpr.DefaultBroadcaster
INFO [qtp665146680-36] 2018-01-02 11:12:33,994  c.v.e.o.s.i.JDK14LoggerAdapter - Broadcaster Polling Wait Time 100
INFO [qtp665146680-36] 2018-01-02 11:12:33,994  c.v.e.o.s.i.JDK14LoggerAdapter - Shared ExecutorService supported: true
INFO [qtp665146680-36] 2018-01-02 11:12:33,995  c.v.e.o.s.i.JDK14LoggerAdapter - Messaging Thread Pool Size: Unlimited
INFO [qtp665146680-36] 2018-01-02 11:12:33,995  c.v.e.o.s.i.JDK14LoggerAdapter - Async I/O Thread Pool Size: 200
INFO [qtp665146680-36] 2018-01-02 11:12:33,996  c.v.e.o.s.i.JDK14LoggerAdapter - Using BroadcasterFactory: org.atmosphere.cpr.DefaultBroadcasterFactory
INFO [qtp665146680-36] 2018-01-02 11:12:33,996  c.v.e.o.s.i.JDK14LoggerAdapter - Using WebSocketProcessor: org.atmosphere.websocket.DefaultWebSocketProcessor
INFO [qtp665146680-36] 2018-01-02 11:12:33,996  c.v.e.o.s.i.JDK14LoggerAdapter - Invoke AtmosphereInterceptor on WebSocket message true
INFO [qtp665146680-36] 2018-01-02 11:12:33,997  c.v.e.o.s.i.JDK14LoggerAdapter - HttpSession supported: true
INFO [qtp665146680-36] 2018-01-02 11:12:33,997  c.v.e.o.s.i.JDK14LoggerAdapter - Atmosphere is using DefaultAtmosphereObjectFactory for dependency injection and object creation
INFO [qtp665146680-36] 2018-01-02 11:12:33,997  c.v.e.o.s.i.JDK14LoggerAdapter - Atmosphere is using async support: org.atmosphere.container.JettyServlet30AsyncSupportWithWebSocket running under container: jetty/8.1.18.v20150929 with WebSocket enabled.
INFO [qtp665146680-36] 2018-01-02 11:12:33,998  c.v.e.o.s.i.JDK14LoggerAdapter - Atmosphere Framework 2.2.9.vaadin2 started.
INFO [qtp665146680-36] 2018-01-02 11:12:34,001  c.v.e.o.s.i.JDK14LoggerAdapter - Installed AtmosphereInterceptor  Track Message Size Interceptor using | with priority BEFORE_DEFAULT 
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,555  a.a.SchemeFixer - testing if fixing schema is necessary
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,594  a.a.SchemeFixer - finished schema test
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,613  a.a.CorpusAdministration - Importing corpus from: /home/user/annis-export
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,627  a.a.AbstractAdminstrationDao - Locking repository_metadata table to ensure no other import is running
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,637  a.a.AdministrationDao - creating staging area for import format version 3.3
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,660  a.a.AdministrationDao - bulk-loading data
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,672  a.a.AdministrationDao - example_queries.annis file not found
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,695  a.a.DeleteCorpusDao - delete conflicting corpus: test-corpus
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,700  a.a.DeleteCorpusDao - deleting external data files
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,711  a.a.DeleteCorpusDao - dropping tables
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,721  a.a.DeleteCorpusDao - recursivly deleting corpora: [1]
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,734  a.a.AdministrationDao - activating relational constraints
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,753  a.a.AdministrationDao - creating indexes for staging area
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,759  a.a.AdministrationDao - checking resolver_vis_map for errors
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,762  a.a.AdministrationDao - analyzing staging area
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,775  a.a.AdministrationDao - add the document name as metadata
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,780  a.a.AdministrationDao - querying ID offsets
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,785  a.a.AdministrationDao - query for the new corpus ID
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,788  a.a.AdministrationDao - new corpus ID is 1
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,791  a.a.AdministrationDao - creating node ID mapping (for properly sorted IDs)
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,811  a.a.AdministrationDao - importing all binary data from ExtData
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,813  a.a.AdministrationDao - extending _text
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,819  a.a.AdministrationDao - extending _example_queries
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,823  a.a.AdministrationDao - computing statistics for top-level corpus
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,835  a.a.AdministrationDao - analyzing staging area
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,841  a.a.AdministrationDao - moving corpus from staging area to main db
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,852  a.a.AdministrationDao - computing path information of the corpus tree for corpus with ID 1
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,857  a.a.AdministrationDao - creating annotations table for corpus with ID 1
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,869  a.a.AdministrationDao - indexing annotations table for corpus with ID 1
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,888  a.a.AdministrationDao - creating annotation category table for corpus with ID 1
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,891  a.a.AdministrationDao - creating materialized facts table for corpus with ID 1
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,909  a.a.AdministrationDao - indexing the new facts table (general indexes)
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:36,998  a.a.AdministrationDao - indexing the new facts table (edge related indexes)
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:37,080  a.a.AdministrationDao - adjusting statistical information for left_token and right_token columns
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:37,087  a.a.AdministrationDao - dropping staging area
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:37,117  a.a.AdministrationDao - creating new corpus.properties file
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:37,124  a.d.QueryDaoImpl - write config file: /home/user/.annis/data/corpus_test-corpus_3423ca03-6858-4037-84d4-a9563b7941ec.properties
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:37,129  a.a.AdministrationDao - analyzing facts table for corpus with ID 1
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:37,394  a.d.a.QueriesGenerator - generated example query: "be"
WARN [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:37,444  a.d.a.QueriesGenerator - could not generating auto query with annis.dao.autogenqueries.AutoSimpleRegexQuery
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:37,451  a.a.CorpusAdministration - Finished import from: /home/user/annis-export
INFO [SwingWorker-pool-1-thread-2] 2018-01-02 11:12:37,452  a.a.AdministrationDao - analyzing parent facts table
INFO [pool-2-thread-65] 2018-01-02 11:12:52,196  a.s.i.QueryServiceImpl - function: COUNT, query: test, corpus: [test-corpus], runtime: 19 ms
INFO [pool-2-thread-64] 2018-01-02 11:12:52,213  a.s.i.QueryServiceImpl - function: FIND, query: test, corpus: [test-corpus], runtime: 39 ms
INFO [pool-2-thread-67] 2018-01-02 11:12:52,279  a.s.i.QueryServiceImpl - function: SUBGRAPH, corpus: [test-corpus], runtime: 20 ms, matches: default_ns::test::salt:/test-corpus/test-corpus#sSpan1, seg: null, left: 5, right: 5, filter: all

Screenshot:

grafik

annis-export.zip salt-project.zip

thomaskrause commented 4 years ago

Seems to be fixed in ANNIS 4 development version.

grafik