oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.33k stars 746 forks source link

PlainXref.yylex() taking long time to process long lines #3447

Open ChristopheBordieu opened 3 years ago

ChristopheBordieu commented 3 years ago

Describe the bug OpenGrok indexing does not finish. It seems to be totally stuck.

Java JDK 11.0.9 Apache Tomcat 9.0.41 OpenGrok 1.5.11 VM RHEL 8.3 / 32G / 10 CPUs / 3.5T Git 2.27

To Reproduce Indexer is started with (from ps -ef f www):

opengrok 4067614 4067609  0 04:47 ?        Sl     0:17          |   \_ /usr/bin/python3.6 /home/opengrok/.local/bin/opengrok-indexer -C -J=-server -J=-Xmx28g -J=-Djava.util.logging.config.file=/srv/opengrok-var/log/logging.properties -a /srv/opengrok/current/lib/opengrok.jar -- -m 256 -v -c /srv/universal-ctags/bin/ctags -s /srv/opengrok-src --depth 2 -d /srv/opengrok-var -i *.bin -i *.gz -i *.rpt -i *.tar -i *.zip -i *.class -i *.bz2 -i *.7z -i *.tgz -i *.exe -i *.jar -i *.war -i *.a -i *.xlsx -i *.docx -i *.pptx -i *.doc -i *.ppt -i *.xls -P -S -R /srv/opengrok-var/etc/configuration.xml -W /srv/opengrok-var/etc/configuration.xml -U http://localhost:8080/grok
opengrok 4067618 4067614 99 04:47 ?        Sl   1109:01          |       \_ /srv/java/current/bin/java -server -Dorg.opengrok.indexer.history.git=/usr/bin/git -Xmx28g -Djava.util.logging.config.file=/srv/opengrok-var/log/logging.properties -jar /srv/opengrok/current/lib/opengrok.jar -m 256 -v -c /srv/universal-ctags/bin/ctags -s /srv/opengrok-src --depth 2 -d /srv/opengrok-var -i *.bin -i *.gz -i *.rpt -i *.tar -i *.zip -i *.class -i *.bz2 -i *.7z -i *.tgz -i *.exe -i *.jar -i *.war -i *.a -i *.xlsx -i *.docx -i *.pptx -i *.doc -i *.ppt -i *.xls -P -S -R /srv/opengrok-var/etc/configuration.xml -W /srv/opengrok-var/etc/configuration.xml -U http://localhost:8080/grok

All 10 ctags process are still there. 1 single thread is running 100% id 235964:

235964 4067614 opengrok   20   0 35.8G 11.2G  3624 R 99.2 35.9  7h44:53 │        │     │  ├─ ForkJoinPool-1-

Threaddump in attachment: threaddump.opengrok.1.5.11.txt

root:~$ strace -p 4067614
strace: Process 4067614 attached
wait4(4067618, ^Cstrace: Process 4067614 detached
 <detached ...>
root:~$ strace -p 4067618
strace: Process 4067618 attached
futex(0x7fb5ae6b09d0, FUTEX_WAIT, 4067619, NULL^Cstrace: Process 4067618 detached
 <detached ...>
root:~$ ps -ef f | grep [4]067619
root:~$ strace -p 235964
strace: Process 235964 attached
strace: [ Process PID=235964 runs in x32 mode. ]

Expected behavior I expect process finishes uploading new configuration date to the OG web service.

ChristopheBordieu commented 3 years ago

I think I am going to switch to OG 1.3.16 for some trials. Before to do that, tell me if you would need some more data regarding the processes/threads that are still running/stuck.

ChristopheBordieu commented 3 years ago
[opengrok log]$ tail -f opengrok0.0.log
2021-03-02 07:40:39.281+0000 FINER t241250 DefaultIndexChangedListener.fileAdded: Added: /PROJ/repo1/file1.gsv (PlainAnalyzer)
2021-03-02 07:40:39.281+0000 FINEST t241250 Statistics.logIt: file '/srv/opengrok-src/PROJ/repo1/file1.gsv' indexed (took 4 ms) 
2021-03-02 07:40:39.283+0000 FINEST t241250 AnalyzerGuru.findForStream: /PROJ/repo1/file1.play: chosen by imprecise magic: PlainAnalyzerFactory
2021-03-02 07:40:39.283+0000 FINE t241250 DefaultIndexChangedListener.fileAdd: Add: /PROJ/repo1/file1.play (PlainAnalyzer)
2021-03-02 07:40:39.285+0000 FINER t241250 DefaultIndexChangedListener.fileAdded: Added: /PROJ/repo1/file1.play (PlainAnalyzer)
2021-03-02 07:40:39.285+0000 FINEST t241250 Statistics.logIt: file '/srv/opengrok-src/PROJ/repo1/file1.play' indexed (took 3 ms) 
2021-03-02 07:40:39.287+0000 FINEST t241250 AnalyzerGuru.findForStream: /PROJ/repo1/file1.gsv: chosen by imprecise magic: PlainAnalyzerFactory
2021-03-02 07:40:39.287+0000 FINE t241250 DefaultIndexChangedListener.fileAdd: Add: /PROJ/repo1/file1.gsv (PlainAnalyzer)
2021-03-02 07:40:39.291+0000 FINER t241250 DefaultIndexChangedListener.fileAdded: Added: /PROJ/repo1/file1.gsv (PlainAnalyzer)
2021-03-02 07:40:39.291+0000 FINEST t241250 Statistics.logIt: file '/srv/opengrok-src/PROJ/repo1/file1.gsv' indexed (took 6 ms) 
~                                                                                                                                                                                                            

Datetime is UTC. No change since early in the morning.

ChristopheBordieu commented 3 years ago

To be complete, further to issue with temporary ctags file I already described into another ticket, each 60s, I have another script that is looking into /tmp for files tags.*. I sort them by time and keep only the 10 newest ones (since there are 10 ctags processes).

vladak commented 3 years ago

I see one thread in the dump that could be the thread pegging the CPU (the java thread IDs do not match the LWP ID):

"ForkJoinPool-1-worker-31" #241254 daemon prio=5 os_prio=0 cpu=26217242.07ms elapsed=30699.58s tid=0x00007fb42401f800 nid=0x399bc runnable  [0x00007fb5181e6000]
   java.lang.Thread.State: RUNNABLE
        at org.opengrok.indexer.analysis.plain.PlainXref.yylex(PlainXref.java:846)
        at org.opengrok.indexer.analysis.JFlexXref.write(JFlexXref.java:489)
        at org.opengrok.indexer.analysis.TextAnalyzer.writeXref(TextAnalyzer.java:82)
        at org.opengrok.indexer.analysis.plain.PlainAnalyzer.analyze(PlainAnalyzer.java:151)
        at org.opengrok.indexer.analysis.AnalyzerGuru.populateDocument(AnalyzerGuru.java:632)
        at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:753)
        at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$2(IndexDatabase.java:1279)
        at org.opengrok.indexer.index.IndexDatabase$$Lambda$438/0x000000080035f440.apply(Unknown Source)
        at java.util.stream.Collectors.lambda$groupingByConcurrent$59(java.base@11.0.9/Collectors.java:1297)
        at java.util.stream.Collectors$$Lambda$440/0x000000080035e040.accept(java.base@11.0.9/Unknown Source)
        at java.util.stream.ReferencePipeline.lambda$collect$1(java.base@11.0.9/ReferencePipeline.java:575)
        at java.util.stream.ReferencePipeline$$Lambda$441/0x000000080035e440.accept(java.base@11.0.9/Unknown Source)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(java.base@11.0.9/ForEachOps.java:183)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(java.base@11.0.9/ArrayList.java:1655)
        at java.util.stream.AbstractPipeline.copyInto(java.base@11.0.9/AbstractPipeline.java:484)
        at java.util.stream.ForEachOps$ForEachTask.compute(java.base@11.0.9/ForEachOps.java:290)
        at java.util.concurrent.CountedCompleter.exec(java.base@11.0.9/CountedCompleter.java:746)
        at java.util.concurrent.ForkJoinTask.doExec(java.base@11.0.9/ForkJoinTask.java:290)
        at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(java.base@11.0.9/ForkJoinPool.java:1020)
        at java.util.concurrent.ForkJoinPool.scan(java.base@11.0.9/ForkJoinPool.java:1656)
        at java.util.concurrent.ForkJoinPool.runWorker(java.base@11.0.9/ForkJoinPool.java:1594)
        at java.util.concurrent.ForkJoinWorkerThread.run(java.base@11.0.9/ForkJoinWorkerThread.java:183)
vladak commented 3 years ago

Unfortunately in the description/output the thread name is cut off after ForkJoinPool-1-.

vladak commented 3 years ago

Perhaps this was some big file being processed ? It would help to gather the list of opened files with lsof at the point this happens.

ChristopheBordieu commented 3 years ago

I deleted all index data. And I will reindex. Size of thread names cannot be changed after some googling. Eventually, I will dig into /proc if issue reproduces. And also will run lsof as you suggest.

vladak commented 3 years ago

Actually:

"ForkJoinPool-1-worker-31" #241254 daemon prio=5 os_prio=0 cpu=26217242.07ms elapsed=30699.58s tid=0x00007fb42401f800 nid=0x399bc runnable  [0x00007fb5181e6000]

nid=0x399bc is 235964 in decimal which matches the LWP ID in the description.

vladak commented 3 years ago

I wonder if the lexer has entered an infinite loop and what file caused this.

ChristopheBordieu commented 3 years ago

Does the lines of opengrok0.0.log before the ones I provided could allow to know that? Log has been removed. Just to know for next time.

vladak commented 3 years ago

Could be one of these files, yes. One that has fileAdd but no corresponding fileAdded log entry.

ChristopheBordieu commented 3 years ago

Issue reproduced with fresh indexing from scratch. Exactly same issue: thread ForkJoinPool-1-21 RUNNABLE

"ForkJoinPool-1-worker-21" #241257 daemon prio=5 os_prio=0 cpu=56820245.72ms elapsed=80000.82s tid=0x00007fb86000a800 nid=0x95097 runnable  [0x00007fb8e6f47000]
   java.lang.Thread.State: RUNNABLE
        at org.opengrok.indexer.analysis.plain.PlainXref.yylex(PlainXref.java:846)
        at org.opengrok.indexer.analysis.JFlexXref.write(JFlexXref.java:489)
        at org.opengrok.indexer.analysis.TextAnalyzer.writeXref(TextAnalyzer.java:82)
        at org.opengrok.indexer.analysis.plain.PlainAnalyzer.analyze(PlainAnalyzer.java:151)
        at org.opengrok.indexer.analysis.AnalyzerGuru.populateDocument(AnalyzerGuru.java:632)
        at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:753)
        at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$2(IndexDatabase.java:1279)
        at org.opengrok.indexer.index.IndexDatabase$$Lambda$458/0x000000080032b440.apply(Unknown Source)
        at java.util.stream.Collectors.lambda$groupingByConcurrent$59(java.base@11.0.9/Collectors.java:1297)
        at java.util.stream.Collectors$$Lambda$463/0x000000080032cc40.accept(java.base@11.0.9/Unknown Source)
        at java.util.stream.ReferencePipeline.lambda$collect$1(java.base@11.0.9/ReferencePipeline.java:575)
        at java.util.stream.ReferencePipeline$$Lambda$466/0x000000080032d840.accept(java.base@11.0.9/Unknown Source)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(java.base@11.0.9/ForEachOps.java:183)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(java.base@11.0.9/ArrayList.java:1655)
        at java.util.stream.AbstractPipeline.copyInto(java.base@11.0.9/AbstractPipeline.java:484)
        at java.util.stream.ForEachOps$ForEachTask.compute(java.base@11.0.9/ForEachOps.java:290)
        at java.util.concurrent.CountedCompleter.exec(java.base@11.0.9/CountedCompleter.java:746)
        at java.util.concurrent.ForkJoinTask.doExec(java.base@11.0.9/ForkJoinTask.java:290)
        at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(java.base@11.0.9/ForkJoinPool.java:1020)
        at java.util.concurrent.ForkJoinPool.scan(java.base@11.0.9/ForkJoinPool.java:1656)
        at java.util.concurrent.ForkJoinPool.runWorker(java.base@11.0.9/ForkJoinPool.java:1594)
        at java.util.concurrent.ForkJoinWorkerThread.run(java.base@11.0.9/ForkJoinWorkerThread.java:183)

lsof gave me the related file:

root@opengrok:$ lsof -p 247974 | egrep '/srv/opengrok-(var|src)'
java    247974 opengrok   70wW  REG              253,1         0  536871049 /srv/opengrok-var/log/opengrok0.0.log.lck
java    247974 opengrok   72w   REG              253,1  11769481  172898062 /srv/opengrok-var/index/SSCI/_5.fdt
java    247974 opengrok   74w   REG              253,1  13308600  172856656 /srv/opengrok-var/index/SSCI/_6.fdt
java    247974 opengrok   79w   REG              253,1         0  172856672 /srv/opengrok-var/index/SSCI/_6_Lucene85FieldsIndex-doc_ids_o.tmp
java    247974 opengrok   80w   REG              253,1   4621778  544268545 /srv/opengrok-var/log/opengrok0.0.log
java    247974 opengrok   82w   REG              253,1         0  173236922 /srv/opengrok-var/index/SSCI/_5_Lucene85FieldsIndex-doc_ids_k.tmp
java    247974 opengrok   83w   REG              253,1    556461  172856676 /srv/opengrok-var/index/SSCI/_6.tvd
java    247974 opengrok   90w   REG              253,1   3341565  172967344 /srv/opengrok-var/index/SSCI/_7.fdt
java    247974 opengrok   91w   REG              253,1         0  172856674 /srv/opengrok-var/index/SSCI/_6_Lucene85FieldsIndexfile_pointers_p.tmp
java    247974 opengrok   93w   REG              253,1    237336  172967348 /srv/opengrok-var/index/SSCI/_7.tvd
java    247974 opengrok   94w   REG              253,1         0  173237526 /srv/opengrok-var/index/SSCI/_5_Lucene85FieldsIndexfile_pointers_l.tmp
java    247974 opengrok   96w   REG              253,1         0  172967345 /srv/opengrok-var/index/SSCI/_7_Lucene85FieldsIndex-doc_ids_s.tmp
java    247974 opengrok  100w   REG              253,1         0  172967346 /srv/opengrok-var/index/SSCI/_7_Lucene85FieldsIndexfile_pointers_t.tmp
java    247974 opengrok  102w   REG              253,1         0  172856682 /srv/opengrok-var/index/SSCI/_6_Lucene85TermVectorsIndex-doc_ids_q.tmp
java    247974 opengrok  105w   REG              253,1         0  172856689 /srv/opengrok-var/index/SSCI/_6_Lucene85TermVectorsIndexfile_pointers_r.tmp
java    247974 opengrok  111w   REG              253,1         0  172967349 /srv/opengrok-var/index/SSCI/_7_Lucene85TermVectorsIndex-doc_ids_u.tmp
java    247974 opengrok  112w   REG              253,1         0  172967352 /srv/opengrok-var/index/SSCI/_7_Lucene85TermVectorsIndexfile_pointers_v.tmp
java    247974 opengrok  145w   REG              253,1    605540  173238355 /srv/opengrok-var/index/SSCI/_5.tvd
java    247974 opengrok  155w   REG              253,1      6675 1741811808 /srv/opengrok-var/xref/SSCI/.../APIS.gsv.gz.org_opengrok
java    247974 opengrok  160w   REG              253,1         0  173238356 /srv/opengrok-var/index/SSCI/_5_Lucene85TermVectorsIndex-doc_ids_m.tmp
java    247974 opengrok  162r   REG              253,0  16811439 3017232457 /srv/opengrok-src/SSCI/.../APIS.gsv
java    247974 opengrok  170r   REG              253,0  16811439 3017232457 /srv/opengrok-src/SSCI/.../APIS.gsv
java    247974 opengrok  172w   REG              253,1         0  173238357 /srv/opengrok-var/index/SSCI/_5_Lucene85TermVectorsIndexfile_pointers_n.tmp
java    247974 opengrok  192r   REG              253,0  16811439 3017232457 /srv/opengrok-src/SSCI/.../APIS.gsv
root@opengrok:$ ll /srv/opengrok-var/xref/SSCI/.../APIS.gsv.gz.org_opengrok
-rw-rw-r--. 1 opengrok opengrok 6675 Mar  3 15:49 /srv/opengrok-var/xref/SSCI/.../APIS.gsv.gz.org_opengrok
root@opengrok:$ ll /srv/opengrok-src/SSCI/.../APIS.gsv
-rw-r--r--. 1 opengrok opengrok 16811439 Feb 28 07:51 /srv/opengrok-src/SSCI/.../APIS.gsv

There are plenty of other .gsv files that are indexed OK (several thousands). On our current PRD, running older tool versions, this file APIS.gsv does not cause any trouble: it is available into xref directory.

@vladak Is there a way to dig into the file to see what could be the reason the PlainAnalyzer has trouble with this .gsv file? Is there a way to kill the thread without killing the java process so that it goes on pushing the new configuration?

vladak commented 3 years ago

These are good questions. I don't have the answers right now. The workaround would be to ignore this file using the -i option.

What do the contents of the file look like ? Could you possibly redact it somehow and publish ?

The PlainXref (the source from which is the lexer generated) does not really do much. It contains one interesting tidbit, though: https://github.com/oracle/opengrok/blob/ea52985050d66cb32f7198addc401bcc9f2ba4e6/opengrok-indexer/src/main/resources/analysis/plain/PlainXref.lex#L64-L69 which references #108 which describes the same symptoms I think so it seems that the lexer rule does not help anymore.

vladak commented 3 years ago

Side comment: the lexer is possibly not stuck, it is just being slow as described in #108. Possibly there are some very long lines in the file.

ChristopheBordieu commented 3 years ago

Ok... Dug into this file... It seems totally corrupted... Reported this to the devs. And actually, yes, corrupted data are seen as extremely long lines. Good catch! Thanks, it helped.

Will rm this corrupted file then restart indexing.

vladak commented 3 years ago

How long were the lines ?

vladak commented 3 years ago

Will keep this open, evidently the rule is not enough. Also, there is probably some history of issues/changes related to long lines that is worth exploring.

vladak commented 3 years ago

For the record, I salvaged the attachment to #108 that caused the original problem and it has 981448 characters (there is no newline in the file in fact). Might be worthwhile to make it into a unit test.

ChristopheBordieu commented 3 years ago

How long were the lines ?

There are 4 lines with 4194402, 4194402, 4194524 and 4194624 characters into APIS.gsv file!

I found in the repo other .gsv files that are corrupted. I removed them and rerun again indexing.

ChristopheBordieu commented 3 years ago

Indexing has finished :-) Final logs seem to show some troubles but, at the end, the webapp is serving the data.

2021-03-03 22:26:17.433+0000 INFO t1 Statistics.logIt: Done indexing data of all repositories (took 2:10:26)
2021-03-03 22:26:17.495+0000 INFO t1 Indexer.writeConfigToFile: Writing configuration to /srv/opengrok-var/etc/configuration.xml
2021-03-03 22:26:24.038+0000 INFO t1 Indexer.writeConfigToFile: Done...
2021-03-03 22:26:24.038+0000 INFO t1 Indexer.sendToConfigHost: Sending configuration to: http://localhost:8080/grok
2021-03-03 22:26:29.668+0000 FINEST t1 ServiceFinder$LazyObjectIterator.next: Loading next object: org.glassfish.jersey.inject.hk2.Hk2InjectionManagerFactory
2021-03-03 22:26:29.668+0000 FINER t1 Logger.debug: Created ServiceLocator ServiceLocatorImpl(__HK2_Generated_1893,1893,1427107974)
2021-03-03 22:26:29.669+0000 FINE t1 AbstractHk2InjectionManager.<init>: Clearing Jersey HK2 caches. Service cache size: 3, reflection cache size: 20.
2021-03-03 22:26:29.677+0000 FINEST t1 ServiceFinder$LazyClassIterator.next: Loading next class: org.glassfish.jersey.logging.LoggingFeatureAutoDiscoverable
2021-03-03 22:26:29.677+0000 FINEST t1 ServiceFinder$LazyClassIterator.next: Loading next class: org.glassfish.jersey.internal.config.ExternalPropertiesAutoDiscoverable
2021-03-03 22:26:29.677+0000 FINEST t1 ServiceFinder$LazyClassIterator.next: Loading next class: org.glassfish.jersey.server.filter.internal.ServerFiltersAutoDiscoverable
2021-03-03 22:26:29.677+0000 FINEST t1 ServiceFinder$LazyClassIterator.next: Loading next class: org.glassfish.jersey.jackson.internal.JacksonAutoDiscoverable
2021-03-03 22:26:29.678+0000 FINEST t1 ServiceFinder$LazyClassIterator.next: Loading next class: org.glassfish.jersey.server.validation.internal.ValidationAutoDiscoverable
2021-03-03 22:26:29.678+0000 FINEST t1 ServiceFinder$LazyClassIterator.next: Loading next class: org.glassfish.jersey.server.wadl.internal.WadlAutoDiscoverable
2021-03-03 22:26:29.678+0000 FINEST t1 ServiceFinder$LazyClassIterator.next: Loading next class: org.glassfish.jersey.server.internal.monitoring.MonitoringAutodiscoverable
2021-03-03 22:26:29.678+0000 FINEST t1 ServiceFinder$LazyClassIterator.next: Loading next class: org.glassfish.jersey.jaxb.internal.JaxbAutoDiscoverable
2021-03-03 22:26:29.679+0000 FINER t1 SystemPropertiesConfigurationModel.getProperties: System properties configuration provider not allowed
2021-03-03 22:26:29.679+0000 FINE t1 ClientExecutorProvidersConfigurator.lookupManagedScheduledExecutorService: null
java.lang.reflect.InvocationTargetException
    at java.base/jdk.internal.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.glassfish.jersey.client.ClientExecutorProvidersConfigurator.lookupManagedScheduledExecutorService(ClientExecutorProvidersConfigurator.java:174)
    at org.glassfish.jersey.client.ClientExecutorProvidersConfigurator.init(ClientExecutorProvidersConfigurator.java:124)
    at org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:447)
    at org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:317)
    at org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:807)
    at org.glassfish.jersey.client.ClientRequest.getClientRuntime(ClientRequest.java:219)
    at org.glassfish.jersey.client.ClientRequest.getInjectionManager(ClientRequest.java:610)
    at org.glassfish.jersey.client.JerseyWebTarget.onBuilder(JerseyWebTarget.java:364)
    at org.glassfish.jersey.client.JerseyWebTarget.request(JerseyWebTarget.java:192)
    at org.glassfish.jersey.client.JerseyWebTarget.request(JerseyWebTarget.java:36)
    at org.opengrok.indexer.configuration.RuntimeEnvironment.writeConfiguration(RuntimeEnvironment.java:1420)
    at org.opengrok.indexer.index.Indexer.sendToConfigHost(Indexer.java:1134)
    at org.opengrok.indexer.index.Indexer.main(Indexer.java:379)
Caused by: javax.naming.NoInitialContextException: Need to specify class name in environment or system property, or in an application resource file: java.naming.factory.initial
    at java.naming/javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:691)
    at java.naming/javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:305)
    at java.naming/javax.naming.InitialContext.getURLOrDefaultInitCtx(InitialContext.java:342)
    at java.naming/javax.naming.InitialContext.lookup(InitialContext.java:409)
    ... 16 more

2021-03-03 22:26:29.680+0000 CONFIG t1 ExtendedLogger.config: Selected ExecutorServiceProvider implementation [org.glassfish.jersey.client.DefaultClientAsyncExecutorProvider] to be used for injection of executor qualified by [org.glassfish.jersey.client.ClientAsyncExecutor] annotation.
2021-03-03 22:26:29.680+0000 CONFIG t1 ExtendedLogger.config: Selected ScheduledExecutorServiceProvider implementation [org.glassfish.jersey.client.DefaultClientBackgroundSchedulerProvider] to be used for injection of scheduler qualified by [org.glassfish.jersey.client.ClientBackgroundScheduler] annotation.
2021-03-03 22:26:29.683+0000 CONFIG t1 HttpUrlConnector.<init>: Restricted headers are not enabled using [sun.net.http.allowRestrictedHeaders] system property (setting only takes effect on connections created after the property has been set/changed).
2021-03-03 22:26:29.719+0000 FINEST t1 HttpURLConnection.plainConnect0: ProxySelector Request for http://localhost:8080/grok/api/v1/configuration?reindex=true
2021-03-03 22:26:29.730+0000 FINEST t1 HttpURLConnection.plainConnect0: Proxy used: DIRECT
2021-03-03 22:26:29.784+0000 FINE t1 HttpURLConnection.writeRequests: sun.net.www.MessageHeader@4d96609c7 pairs: {PUT /grok/api/v1/configuration?reindex=true HTTP/1.1: null}{Content-Type: application/xml}{User-Agent: Jersey/2.30.1 (HttpUrlConnection 11.0.9)}{Host: localhost:8080}{Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2}{Connection: keep-alive}{Content-Length: 27269089}
2021-03-03 22:40:05.592+0000 FINE t1 HttpURLConnection.getInputStream0: sun.net.www.MessageHeader@615215054 pairs: {null: HTTP/1.1 204}{Date: Wed, 03 Mar 2021 22:40:05 GMT}{Keep-Alive: timeout=20}{Connection: keep-alive}
2021-03-03 22:40:05.602+0000 INFO t1 Indexer.sendToConfigHost: Configuration update routine done, check log output for errors.
2021-03-03 22:40:05.619+0000 FINE t1 Ctags.close: Destroying ctags command
2021-03-03 22:40:05.619+0000 FINE t1 Ctags.close: Destroying ctags command
2021-03-03 22:40:05.619+0000 FINE t1 Ctags.close: Destroying ctags command
2021-03-03 22:40:05.619+0000 FINE t1 Ctags.close: Destroying ctags command
2021-03-03 22:40:05.620+0000 INFO t1 Statistics.logIt: Indexer finished (took 2:37:26)
vladak commented 3 years ago

The java.naming.factory.initial exception is innocuous I think - #2211.

vladak commented 3 years ago

How long were the lines ?

There are 4 lines with 4194402, 4194402, 4194524 and 4194624 characters into APIS.gsv file!

Okay, that's order of magnitude bigger than the original file that reproduced the issue. Looks like different solution is needed to the long lines problem.

vladak commented 3 years ago

For the 3225.txt file with "line" of 981448 characters from #108 the indexer finishes fine. Tried a .txt file with 4194304 characters on single line (noeol), the xref file is successfully created. So, there must be something else about the file contents that causes this.

ChristopheBordieu commented 3 years ago

GSV files contain mainly ASCII characters but also UNICODE characters like for example U+2020 or non European characters.