linuxserver / docker-diskover

A Docker container for the Diskover space mapping application
GNU General Public License v3.0
76 stars 16 forks source link

Diskover not crawling after latest image update #28

Closed mike391 closed 5 years ago

mike391 commented 5 years ago

After updating the image to the latest, in the webui I keep getting the error: No diskover indices found in Elasticsearch. Please run a crawl and come back.

tronyx commented 5 years ago

Ok, looks like it's not starting the working bots:

Starting 8 worker bots in background... ERROR starting bot, check redis and ES are running and diskover.cfg settings.

Appears to be caused by this:

python: can't open file './diskover_worker_bot.py': [Errno 2] No such file or directory

Almost like paths inside the container are broken or something similar.

Doing some more digging.

tronyx commented 5 years ago

Ok, this is due to changes with Diskover with the 1.5.0.3 release. A check was added to the diskover-bot-launcher.sh file which is throwing the error seen in my last post:

        # check if bot started
        if [ $i -eq 1 ]; then
            sleep 1
            ps -p $! > /dev/null 2>&1
            if [ $? -gt 0 ]; then
                echo "ERROR starting bot, check redis and ES are running and diskover.cfg settings."
                exit 1
            fi
        fi

Maintaining all of the changes on the LSIO side, but hardcoding the 1.5.0.2 releases of Diskover, and then building my own image, seems to resolve the issue. I left the Diskover-web release stuff the same as that does not contain any changes that seem to have broken this.

My image is tronyx/diskover if you'd like to test it @mike391.

tronyx commented 5 years ago

I've confirmed that it is just the above check as I updated my Docker image to use the latest release of Diskover again, but then commented out the above lines and, after doing that, everything runs as expected, so none of the other changes made with the 1.5.0.3 release appear to be a factor in this issue.

mike391 commented 5 years ago

Thanks for looking into this! Sorry I didnt include details, Ive been gone for a few days and when I found the bug I wanted to mention it asap since a new build was released.

Using your new image works great! Its finally indexing again, ill let you know if any hiccups occur but so far it looks good.

shirosaidev commented 5 years ago

this error

python: can't open file './diskover_worker_bot.py': [Errno 2] No such file or directory

is not caused by the bot process check you mentioned above, this would be from a path configuration issue on line 17 in diskover-bot-launcher.sh... v1.6.1 of diskover-bot-launcher.sh I added in a check if the bot starts and output the error if it doesn't

ERROR starting bot, check redis and ES are running and diskover.cfg settings.

shirosaidev commented 5 years ago

can you start the bots manually with this?

cd /app/diskover
python ./diskover_worker_bot.py

Before you run diskover-bot-launcher.sh you have to change into the diskover directory or update that path on line 17. Not sure how this broke in latest lsio image v1.5.0.3... maybe from me updating diskover-bot-launcher.sh ?

shirosaidev commented 5 years ago

I've updated diskover-bot-launcher.sh to output if unable to find the .py files, please download latest and see what error you get https://github.com/shirosaidev/diskover/blob/master/diskover-bot-launcher.sh

tronyx commented 5 years ago

I am able to start a worker with the following:

cd /app/diskover
python ./diskover_worker_bot.py

However, when I do the following I get the error ERROR starting bot, check redis and ES are running and diskover.cfg settings.:

cd /app/diskover
./diskover-bot-launcher.sh

I see the following in the bot log:

14:12:41 Registering birth of worker 09d0ae94efaa.1219
14:12:41 RQ worker 'rq:worker:09d0ae94efaa.1219' started, version 0.13.0
14:12:41 *** Listening on diskover, diskover_crawl, diskover_calcdir...
14:12:41 Sent heartbeat to prevent worker timeout. Next one should arrive within 420 seconds.
14:12:41 Cleaning registries for queue: diskover
14:12:41 Cleaning registries for queue: diskover_crawl
14:12:41 Cleaning registries for queue: diskover_calcdir
14:12:41 *** Listening on diskover,diskover_crawl,diskover_calcdir...
14:12:41 Sent heartbeat to prevent worker timeout. Next one should arrive within 420 seconds.

I then see a bot in the RQ Dashboard.

If I run the diskover-bot-launcher.sh script with bash -x I see the following:

root@09d0ae94efaa:/app# bash -x diskover/diskover-bot-launcher.sh
+ PYTHON=python
+ DISKOVERBOT=./diskover_worker_bot.py
+ KILLREDISCONN=./killredisconn.py
+ WORKERBOTS=8
+ BURST=FALSE
+ BOTLOG=/config/bot.log
+ LOGLEVEL=3
+ BOTPIDS=/tmp/diskover_bot_pids
+ VERSION=1.6.1
+ KILLBOTS=FALSE
+ RESTARTBOTS=FALSE
+ REMOVEBOTS=FALSE
+ FORCEREMOVEBOTS=FALSE
+ SHOWBOTS=FALSE
+ getopts ':h?w:bskrRfl:V' opt
+ banner
++ tput setaf 1
++ tput sgr 0
+ '[' FALSE == TRUE ']'
+ '[' FALSE == TRUE ']'
+ '[' FALSE == TRUE ']'
+ '[' FALSE == TRUE ']'
+ startbots
+ echo 'Starting 8 worker bots in background...'
Starting 8 worker bots in background...
+ ARGS=
+ '[' FALSE == TRUE ']'
+ '[' 3 == 0 ']'
+ '[' 3 == 1 ']'
+ '[' 3 == 2 ']'
+ '[' 3 == 3 ']'
+ ARGS+='-l DEBUG'
+ (( i = 1 ))
+ (( i <= 8 ))
+ '[' '!' /config/bot.log ']'
+ '[' 1 -eq 1 + ']'
python ./diskover_worker_bot.py -l DEBUG
+ sleep 1
+ ps -p 2039
+ '[' 1 -gt 0 ']'
+ echo 'ERROR starting bot, check redis and ES are running and diskover.cfg settings.'
ERROR starting bot, check redis and ES are running and diskover.cfg settings.
+ exit 1

It's killing it after starting one bot. Looks like the if [ $? -gt 0 ]; then portion of the check should read if [ $? -gt "${WORKERBOTS}" ]; then because if I do that it shows it successfully creates 8 bots, and I can see 8 PIDs in the PID file, but I do not see the bots in the RQ Dashboard.

Now, if I run the dispatcher.sh script with the above change in place, I see the following:

root@09d0ae94efaa:/app# ./dispatcher.sh
killing existing workers...
emptying current redis queues...
0 jobs removed from diskover_crawl queue
0 jobs removed from diskover queue
0 jobs removed from diskover_calcdir queue
0 jobs removed from failed queue
killing dangling workers...
starting workers with following options:

Starting 8 worker bots in background...
09d0ae94efaa.2756 (pid 2756) (botnum 1)
09d0ae94efaa.2763 (pid 2763) (botnum 2)
09d0ae94efaa.2765 (pid 2765) (botnum 3)
09d0ae94efaa.2767 (pid 2767) (botnum 4)
09d0ae94efaa.2769 (pid 2769) (botnum 5)
09d0ae94efaa.2771 (pid 2771) (botnum 6)
09d0ae94efaa.2773 (pid 2773) (botnum 7)
09d0ae94efaa.2775 (pid 2775) (botnum 8)
DONE!
All worker bots have started
Worker bot output is getting logged to /config/bot.log.botnum
Worker pids have been stored in /tmp/diskover_bot_pids, use -k flag to shutdown workers or -r to restart
Exiting, sayonara!
starting crawler with following options: --autotag -d /data -a -i diskover-2019-07-23

Now I see 8 bots in the RQ Dashboard so I believe you need to make the above change within Diskover for the new check you put in place.

shirosaidev commented 5 years ago

Try with the latest diskover-bot-launcher.sh on diskover github. I've added in a new check for the paths to the .py files which get set at the top of the .sh file. I think the issue is you are not running the diskover-bot-launcher.sh from within the /app/diskover directory..

tronyx commented 5 years ago

Ok, @shirosaidev and I have found the actual culprit:

root@5ce6acc0f6c2:/app/diskover# ps -p 801
ps: unrecognized option: p

We're working on a solution that will work for the container as well as everything else.

shirosaidev commented 5 years ago

I've pushed a fix for this to v1.6.2 of diskover-bot-launcher.sh to remove the -p for any ps command in the .sh script. Thanks @christronyxyocum . I've rebuilt v1.5.0.3 of diskover release on diskover github with this update.

tronyx commented 5 years ago

@mike391 New official image has been built with the fixes. I have a PR waiting with fixes for the Redis cleanup script not working as well so there should be another new image in the near future.

mike391 commented 5 years ago

Once in a while I still am unable to crawl. Recreating the containers 1-2 times fixes it sometimes. Usually happens more frequently once I add DISKOVER_OPTS=-D -A and add an autotag rule in my diskover.cfg. My redis and elasticsearch containers dont show any errors in the logs, and im able to do a ping redis and a ping elasticsearch from within my diskover container.


killing existing workers...
emptying current redis queues...
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/redis/connection.py", line 493, in connect
    sock = self._connect()
  File "/usr/lib/python3.6/site-packages/redis/connection.py", line 550, in _connect
    raise err
  File "/usr/lib/python3.6/site-packages/redis/connection.py", line 538, in _connect
    sock.connect(socket_address)
OSError: [Errno 99] Address not available

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/redis/client.py", line 754, in execute_command
    connection.send_command(*args)
  File "/usr/lib/python3.6/site-packages/redis/connection.py", line 619, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/usr/lib/python3.6/site-packages/redis/connection.py", line 594, in send_packed_command
    self.connect()
  File "/usr/lib/python3.6/site-packages/redis/connection.py", line 498, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 99 connecting to None:6379. Address not available.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/redis/connection.py", line 493, in connect
    sock = self._connect()
  File "/usr/lib/python3.6/site-packages/redis/connection.py", line 550, in _connect
    raise err
  File "/usr/lib/python3.6/site-packages/redis/connection.py", line 538, in _connect
    sock.connect(socket_address)
OSError: [Errno 99] Address not available

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/rq", line 11, in <module>
    sys.exit(main())
  File "/usr/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/rq/cli/cli.py", line 76, in wrapper
    return ctx.invoke(func, cli_config, *args[1:], **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/rq/cli/cli.py", line 109, in empty
    num_jobs = queue.empty()
  File "/usr/lib/python3.6/site-packages/rq/queue.py", line 117, in empty
    return script(keys=[self.key])
  File "/usr/lib/python3.6/site-packages/redis/client.py", line 3498, in __call__
    return client.evalsha(self.sha, len(keys), *args)
  File "/usr/lib/python3.6/site-packages/redis/client.py", line 2704, in evalsha
    return self.execute_command('EVALSHA', sha, numkeys, *keys_and_args)
  File "/usr/lib/python3.6/site-packages/redis/client.py", line 760, in execute_command
    connection.send_command(*args)
  File "/usr/lib/python3.6/site-packages/redis/connection.py", line 619, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/usr/lib/python3.6/site-packages/redis/connection.py", line 594, in send_packed_command
    self.connect()
  File "/usr/lib/python3.6/site-packages/redis/connection.py", line 498, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 99 connecting to None:6379. Address not available.
killing dangling workers...
starting workers with following options:

  ________  .__        __
  \______ \ |__| _____|  | _________  __ ___________
   |    |  \|  |/  ___/  |/ /  _ \  \/ // __ \_  __ \ /)___(\
   |    `   \  |\___ \|    <  <_> )   /\  ___/|  | \/ (='.'=)
  /_______  /__/____  >__|_ \____/ \_/  \___  >__|   ("\)_("\)
          \/        \/     \/               \/
                Worker Bot Launcher v1.6.2
                https://github.com/shirosaidev/diskover
                "Crawling all your stuff, core melting time"

Starting 8 worker bots in background...
e0349ba02c6d.471 (pid 471) (botnum 1)
e0349ba02c6d.477 (pid 477) (botnum 2)
e0349ba02c6d.479 (pid 479) (botnum 3)
e0349ba02c6d.481 (pid 481) (botnum 4)
e0349ba02c6d.483 (pid 483) (botnum 5)
e0349ba02c6d.485 (pid 485) (botnum 6)
e0349ba02c6d.487 (pid 487) (botnum 7)
e0349ba02c6d.489 (pid 489) (botnum 8)
DONE!
All worker bots have started
Worker pids have been stored in /tmp/diskover_bot_pids, use -k flag to shutdown workers or -r to restart
Exiting, sayonara!
starting crawler with following options: -D -d /data -a -i diskover-2019-07-24
/usr/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.25.3) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)

   ___       ___       ___       ___       ___       ___       ___       ___
  /\  \     /\  \     /\  \     /\__\     /\  \     /\__\     /\  \     /\  \
 /::\  \   _\:\  \   /::\  \   /:/ _/_   /::\  \   /:/ _/_   /::\  \   /::\  \
/:/\:\__\ /\/::\__\ /\:\:\__\ /::-"\__\ /:/\:\__\ |::L/\__\ /::\:\__\ /::\:\__\
\:\/:/  / \::/\/__/ \:\:\/__/ \;:;-",-" \:\/:/  / |::::/  / \:\:\/  / \;:::/  /
 \::/  /   \:\__\    \::/  /   |:|  |    \::/  /   L;;/__/   \:\/  /   |:\/__/
  \/__/     \/__/     \/__/     \|__|     \/__/               \/__/     \|__|
                                      v1.5.0.3
                                      https://shirosaidev.github.io/diskover
                                      Bringing light to the darkness.
                                      Support diskover on Patreon or PayPal :)

2019-07-24 10:39:38,991 [INFO][diskover] Using config file: /app/diskover/diskover.cfg
2019-07-24 10:39:39,005 [INFO][diskover] Found 15 diskover RQ worker bots
2019-07-24 10:39:39,005 [INFO][diskover] Searching diskover-2019-07-24 for duplicate file hashes...
2019-07-24 10:39:39,009 [WARNING][elasticsearch] POST http://elasticsearch:9200/diskover-2019-07-24/_refresh [status:404 request:0.003s]
Traceback (most recent call last):
  File "./diskover.py", line 2045, in <module>
    dupes_finder(es, q, cliargs, logger)
  File "/app/diskover/diskover_dupes.py", line 304, in dupes_finder
    es.indices.refresh(index=cliargs['index'])
  File "/usr/lib/python3.6/site-packages/elasticsearch5/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/lib/python3.6/site-packages/elasticsearch5/client/indices.py", line 56, in refresh
    '_refresh'), params=params)
  File "/usr/lib/python3.6/site-packages/elasticsearch5/transport.py", line 312, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/lib/python3.6/site-packages/elasticsearch5/connection/http_urllib3.py", line 129, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/lib/python3.6/site-packages/elasticsearch5/connection/base.py", line 125, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch5.exceptions.NotFoundError: TransportError(404, 'index_not_found_exception', 'no such index')```
tronyx commented 5 years ago

If you want to use auto-tagging the correct option in your container config would just be:

"DISKOVER_OPTS=--autotag"

The first big error is due to broken cleanup script for Redis, which is fixed in my current PR that's waiting to be merged, but is pretty much ignorable.

Second error is saying that the index is not found within the Elasticsearch cluster as it seems that you are trying to run Diskover with the --finddupes option before a normal crawl is ran to create the index.

mike391 commented 5 years ago

Ah I see, Ill try that for autotagging, also I assumed that diskover would run a normal crawl before finding dupes. This all makes sense now, thanks for your patience!

tronyx commented 5 years ago

No problem. You can kick off a crawl with the following:

docker exec -it diskover /app/dispatcher.sh

Replacing diskover with your container name. After that finishes, which it now should, you can run the following to find any duplicate files:

docker exec -it diskover /usr/bin/python app/diskover/diskover.py -i diskover-2019-07-23 --finddupes

tronyx commented 5 years ago

@mike391 Have things been working okay for you?

thelamer commented 5 years ago

All changes merged, the new image will be available shortly.