Groupfolder indexing killed process (need help here)

ZoXx commented 3 years ago

Hello all, we work exclusively with group folders. The indexing is also running, but is always stopped after a few hours by a process monitoring (600mb). we run the indexing via ssh with php73 and the occ.

There are group folders that sometimes have 250-300gb of data. these are re-read for each user.

With the second user, however, the process is killed. If we then start the indexing again via occ, the indexing starts again from the beginning, instead of continuing where it ended.

is there a solution for this? Can we tweak settings to make indexing run more charmingly?

any tips are appreciated!

ZoXx commented 3 years ago

nobody any idea???

martin-77 commented 3 years ago

for me, elasticsearch,service crashed because of a lack of ram. After upgrading ram, ealsticsearch keeps running. Please, check systemctl status elasticsearch.service Second idea - find out, which file crashes the search - for me, a very large pdf killed the service.

ZoXx commented 3 years ago

Hey Martin. Thanks for reply.

I already checked the pdf's. Its not always the same. The process get killed after round about 100.000 files. Its a managed server from hetzner with 32 GB Ram.

My settings elasticsearch.yml: ` ======================== Elasticsearch Configuration =========================

NOTE: Elasticsearch comes with reasonable defaults for most settings. Before you set out to tweak and tune the configuration, make sure you understand what are you trying to accomplish and the consequences.

The primary way of configuring a node is via this file. This template lists the most important settings you may want to configure for a production cluster.

Please consult the documentation for further information on configuration options: https://www.elastic.co/guide/en/elasticsearch/reference/index.html

---------------------------------- Cluster -----------------------------------

Use a descriptive name for your cluster:

cluster.name: elasticsearch_cloudr

------------------------------------ Node ------------------------------------

Use a descriptive name for the node:

node.name: node-1

Add custom attributes to the node:

node.attr.rack: r1

----------------------------------- Paths ------------------------------------

Path to directory where to store the data (separate multiple locations by comma):

path.data: XXXXXXXXX

Path to log files:

path.logs: XXXXXXXXX

----------------------------------- Memory -----------------------------------

Lock the memory on startup:

bootstrap.memory_lock: true

Make sure that the heap size is set to about half the memory available on the system and that the owner of the process is allowed to use this limit.

Elasticsearch performs poorly when the system is swapping the memory.

---------------------------------- Network -----------------------------------

Set the bind address to a specific IP (IPv4 or IPv6):

network.host: 192.168.0.1

Set a custom port for HTTP:

http.port: 9200

For more information, consult the network module documentation.

--------------------------------- Discovery ----------------------------------

Pass an initial list of hosts to perform discovery when this node is started: The default list of hosts is ["127.0.0.1", "[::1]"]

discovery.seed_hosts: ["host1", "host2"]

Bootstrap the cluster using an initial set of master-eligible nodes:

cluster.initial_master_nodes: ["node-1", "node-2"]

For more information, consult the discovery and cluster formation module documentation.

---------------------------------- Gateway -----------------------------------

Block initial recovery after a full cluster restart until N nodes are started:

gateway.recover_after_nodes: 3

For more information, consult the gateway module documentation.

---------------------------------- Various -----------------------------------

Require explicit names when deleting indices:

action.destructive_requires_name: true

node.max_local_storage_nodes: 2`

My settings jvm.options: ` JVM configuration

IMPORTANT: JVM heap size

You should always set the min and max JVM heap size to the same value. For example, to set the heap to 4 GB, set:

-Xms4g -Xmx4g

See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html for more information

Xms represents the initial size of total heap space Xmx represents the maximum size of total heap space

-Xms8g -Xmx8g

Expert settings

All settings below this section are considered expert settings. Don't tamper with them unless you understand what you are doing

GC configuration 8-13:-XX:+UseConcMarkSweepGC 8-13:-XX:CMSInitiatingOccupancyFraction=75 8-13:-XX:+UseCMSInitiatingOccupancyOnly

G1GC Configuration NOTE: G1 GC is only supported on JDK version 10 or later to use G1GC, uncomment the next two lines and update the version on the following three lines to your version of the JDK 10-13:-XX:-UseConcMarkSweepGC 10-13:-XX:-UseCMSInitiatingOccupancyOnly 14-:-XX:+UseG1GC 14-:-XX:G1ReservePercent=25 14-:-XX:InitiatingHeapOccupancyPercent=30

JVM temporary directory -Djava.io.tmpdir=${ES_TMPDIR}

heap dumps

generate a heap dump when an allocation from the Java heap fails heap dumps are created in the working directory of the JVM -XX:+HeapDumpOnOutOfMemoryError

specify an alternative path for heap dumps; ensure the directory exists and has sufficient space -XX:HeapDumpPath=data

specify an alternative path for JVM fatal error logs -XX:ErrorFile=logs/hs_err_pid%p.log

JDK 8 GC logging 8:-XX:+PrintGCDetails 8:-XX:+PrintGCDateStamps 8:-XX:+PrintTenuringDistribution 8:-XX:+PrintGCApplicationStoppedTime 8:-Xloggc:XXXXXXXXXXXXXXXXXXXXXX 8:-XX:+UseGCLogFileRotation 8:-XX:NumberOfGCLogFiles=32 8:-XX:GCLogFileSize=64m

JDK 9+ GC logging 9-:-Xlog:gc*,gc+age=trace,safepoint:file=/usr/home/cloudr/.linuxbrew/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m `

nextcloud / fulltextsearch_elasticsearch

Groupfolder indexing killed process (need help here) #135