[BUG] speedtest-tracker now only works sporadically for me

bcurran3 commented 3 years ago

"MOVED FROM:" https://github.com/henrywhitaker3/Speedtest-Tracker/issues/495

Describe the bug Automatic and manual testing yields no results.

To Reproduce Just running speedtest-tracker in my enviornment.

Expected behavior Multiple speedtest-tracker results daily.

Screenshots s/b

Context

OS: Docker under Ubuntu 20.04 LTS
Speedtest-Tracker Version: 1.10.0

All tests stopped 02/15 and wouldn't update even when I manually told it to for almost two weeks. I may or might not have rebooted the server in that time frame, probably not, but I bounced the container two days ago and it started working again.

With that said, I think I've got the problem again as testing stopped again yesterday. :(

I'll be happy to provide any info to help solve the problem.

Found this in the logs:

nginx: [alert] detected a LuaJIT version which is not OpenResty's; many optimizations will be disabled and performance will be compromised (see https://github.com/openresty/luajit2 for OpenResty's LuaJIT or, even better, consider using the OpenResty releases from https://openresty.org/en/download.html)
{"type":"log","timestamp":"2021-03-01T09:00:07Z","message":"Error: [11] Cannot read: Resource temporarily unavailable","level":"error"}
{"type":"log","timestamp":"2021-03-01T09:00:14Z","message":"Error: [11] Cannot read: Resource temporarily unavailable","level":"error"}
{"type":"log","timestamp":"2021-03-01T09:00:20Z","message":"Error: [11] Cannot read: Resource temporarily unavailable","level":"error"}
wget: bad address 'geolite.maxmind.com'
wget: bad address 'geolite.maxmind.com'
wget: bad address 'geolite.maxmind.com'
wget: bad address 'geolite.maxmind.com'
wget: bad address 'geolite.maxmind.com'
wget: bad address 'geolite.maxmind.com'
run-parts: /etc/periodic/monthly/geoip: exit status 1

I'm not sure who/what geolite.maxmind.com is but I tried to go to the website and it's down, pinged it and got the same result. maxmind.com doesn't exist at the moment. I'm guessing this is the problem or part of it.

TomokoK commented 3 years ago

Having the same issue, I find that bulk deleting the data tends to make it work again.

henrywhitaker3 commented 3 years ago

Hmm, strange. When this happens again could you guys save your sqlite DBs before deleting stuff? Is in /config/www/database/

Also, is there anything relevant in /config/www/storage/logs/laravel.log?

bcurran3 commented 3 years ago

Haven't cleared my database. My /config/www/storage/logs/laravel.log has nothing after 2020-11-29.

Current status: working but flakey. Working so far today (ignore the two hour Internet outage) and yesterday, but not so much the two days before. No reboots or any changes that I'm aware of.

I've got some errors in the logs. DNS resolving errors most likely during my 2 hour outage, no clue about the ConfigurationError messages.

nginx: [alert] detected a LuaJIT version which is not OpenResty's; many optimizations will be disabled and performance will be compromised (see https://github.com/openresty/luajit2 for OpenResty's LuaJIT or, even better, consider using the OpenResty releases from https://openresty.org/en/download.html)
{"type":"log","timestamp":"2021-03-04T03:00:10Z","message":"Error: [11] Cannot read from socket: Resource temporarily unavailable","level":"error"}
{"type":"log","timestamp":"2021-03-04T05:00:13Z","message":"Error: [11] Cannot read: Resource temporarily unavailable","level":"error"}
[2021-03-04 01:00:07.176] [error] Trying to get interface information on non-initialized socket.
[2021-03-04 01:00:12.689] [error] Configuration - Couldn't resolve host name (HostNotFoundException)
[2021-03-04 01:00:12.689] [error] Configuration - Cannot retrieve configuration document (0)
[2021-03-04 01:00:12.689] [error] ConfigurationError - Could not retrieve or read configuration (Configuration)
[2021-03-04 01:00:12.689] [error] ConfigurationError - Could not retrieve or read configuration (Configuration)
{"type":"log","timestamp":"2021-03-04T09:00:12Z","message":"Configuration - Could not retrieve or read configuration (ConfigurationError)","level":"error"}
[2021-03-04 02:00:06.653] [error] Trying to get interface information on non-initialized socket.
[2021-03-04 02:00:12.166] [error] Configuration - Couldn't resolve host name (HostNotFoundException)
[2021-03-04 02:00:12.166] [error] Configuration - Cannot retrieve configuration document (0)
[2021-03-04 02:00:12.166] [error] ConfigurationError - Could not retrieve or read configuration (Configuration)
[2021-03-04 02:00:12.166] [error] ConfigurationError - Could not retrieve or read configuration (Configuration)
{"type":"log","timestamp":"2021-03-04T10:00:12Z","message":"Configuration - Could not retrieve or read configuration (ConfigurationError)","level":"error"}

TomokoK commented 3 years ago

Cleared the database again around midnight last night without restoring. Failed again at 9:03am this morning and hasn't run another test since. This is my entire log for 2021-03-03:

[2021-03-03 04:18:27] local.ERROR: Undefined index: ping
[2021-03-03 14:03:22] local.ERROR: Undefined index: ping

I have attached the second log I have (2021-03-01) to this comment. laravel-2021-03-01.txt

henrywhitaker3 commented 3 years ago

There anything in /config/log/speedtest/queue.log or same for schedule.log? Might be worth deleting the volume and images then pull the images again and start fresh, looks like some dependencies aren't there

TomokoK commented 3 years ago

I don't have a schedule.log, but I do have a cron.log. I've attached them both. queue.txt cron.txt

dra2885 commented 3 years ago

Linking this up with #461 which is the same issue. I also don't have a schedule.log file.

henrywhitaker3 commented 3 years ago

Yeah probably should be cron.log, not sure why it's hanging though. Pretty tough with stuff like this, can't be that much help without being able to replicate it

dra2885 commented 3 years ago

I just did a complete data wipe and container re-build. Tests started failing on the third manual test.

The contents of cron.log don't show anything unusual:

Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1

Similarly with queue.log though it obviously doesn't show a corresponding "processed" entry for the test that is hung.

[2021-03-06 20:36:51][1] Processing: App\Jobs\SpeedtestJob [2021-03-06 20:37:04][1] Processed: App\Jobs\SpeedtestJob [2021-03-06 20:39:10][2] Processing: App\Jobs\SpeedtestJob [2021-03-06 20:39:32][2] Processed: App\Jobs\SpeedtestJob [2021-03-06 20:47:26][3] Processing: App\Jobs\SpeedtestJob

Is there a setting to populate the logs with more detail?

bcurran3 commented 3 years ago

my speedtest.cron.log:

The failed job [6] has been pushed back onto the queue!
The failed job [5] has been pushed back onto the queue!
The failed job [4] has been pushed back onto the queue!
The failed job [3] has been pushed back onto the queue!
The failed job [2] has been pushed back onto the queue!
The failed job [1] has been pushed back onto the queue!

TomokoK commented 3 years ago

There anything in /config/log/speedtest/queue.log or same for schedule.log? Might be worth deleting the volume and images then pull the images again and start fresh, looks like some dependencies aren't there

Finally had time to test this. Deleted the old images and containers, repulled, scheduled to run at the same frequency as before. Didn't even last 24 hours this time. All error logs are the same as before.

henrywhitaker3 commented 3 years ago

Sorry it's been so long, I'll have a look into this in more detail today or tomorrow. Might not get far as I haven't been able to reproduce it so far

eJonny commented 3 years ago

Sorry it's been so long, I'll have a look into this in more detail today or tomorrow. Might not get far as I haven't been able to reproduce it so far

Hi @henrywhitaker3 , First, let me express gratitude for the fantastic app! It's very useful to show my partner that the problem is her VPN not the home WiFi! You have saved many arguments.

I am having the same "pause" issue others have reported. I'm running on QNAP QTS.

henrywhitaker3 commented 3 years ago

Next time this happens to one of you, could you get a shell in the container and run php /config/www artisan queue:restart then give it 10 or seconds and queue a test again through the UI and let me know if that job works

dra2885 commented 3 years ago

Recreated the container today, reproduced the bug by running a couple of manual tests, and then tried executed the command you requested in a shell inside the container. Didn't have any luck successfully running a manual test after executing the command .. tried the process a few different times, as well.

dra2885 commented 3 years ago

Poking around more on this .. may be obvious, but I guess this is an error with the speedtest package. Some interesting reading here. I'm not savvy enough with my skills yet to figure out how to play around with this .. but thought I'd share in case it's helpful.

eJonny commented 3 years ago

@dra2885 Nice research on this... based on my reading of this maybe there are two issues:

Error: [11] Cannot read: Resource temporarily unavailable

and

[error] ConfigurationError - Could not retrieve or read configuration (Configuration)

It appears the openHAB use of SpeedTest like speedtest-tracker does run into the second error. But I didn't see anything in that openHAB thread that would match the cannot read error?

stevietv commented 3 years ago

run php /config/www artisan queue:restart then give it 10 or seconds and queue a test again through the UI and let me know if that job works

Hey, I tried this and it didn't produce any positive result.

stevietv commented 3 years ago

I used ps to identify which task was hanging, based on my cron start of every 59 minutes and that my last sucessful test was at 17:59 it was easy to identify that the failed process was 3079.

root@39172993c1cf:/# ps -eo pid,cmd,lstart
    PID CMD                                          STARTED
      1 s6-svscan -t0 /var/run/s6/s Wed Apr 14 16:38:33 2021
     38 s6-supervise s6-fdholderd   Wed Apr 14 16:38:35 2021
    403 s6-supervise speedtest      Wed Apr 14 16:38:51 2021
    405 s6-supervise php-fpm        Wed Apr 14 16:38:51 2021
    406 s6-supervise nginx          Wed Apr 14 16:38:51 2021
    407 s6-supervise cron           Wed Apr 14 16:38:51 2021
    408 php-fpm: master process (/e Wed Apr 14 16:38:51 2021
    409 nginx: master process /usr/ Wed Apr 14 16:38:51 2021
    410 php /config/www/artisan que Wed Apr 14 16:38:51 2021
    412 /usr/sbin/crond -f -S -l 5  Wed Apr 14 16:38:51 2021
    433 php-fpm: pool www           Wed Apr 14 16:38:51 2021
    434 php-fpm: pool www           Wed Apr 14 16:38:51 2021
    435 nginx: worker process       Wed Apr 14 16:38:51 2021
    436 nginx: worker process       Wed Apr 14 16:38:51 2021
    437 nginx: worker process       Wed Apr 14 16:38:51 2021
    438 nginx: worker process       Wed Apr 14 16:38:51 2021
   1016 php-fpm: pool www           Wed Apr 14 16:53:07 2021
   3079 /config/www/app/Bin/speedte Wed Apr 14 18:59:01 2021
  14744 bash                        Thu Apr 15 08:33:22 2021
  15130 ps -eo pid,cmd,lstart       Thu Apr 15 08:39:24 2021
root@39172993c1cf:/# kill -9 3079

Killing that process caused finally a failed test to register and the entire queue of tests that should have happened overnight (plus the times i clicked the manual test button) to get worked through:

stevietv commented 3 years ago

it seems like there would need to be some handling of the error - when this error is entered into the log: {"type":"log","timestamp":"2021-04-15T06:42:48Z","message":"Error: [11] Cannot read: Resource temporarily unavailable","level":"error"} then the process should be killed to ensure the queue doesn't hang

dra2885 commented 3 years ago

@dra2885 Nice research on this... based on my reading of this maybe there are two issues:

Error: [11] Cannot read: Resource temporarily unavailable

and

[error] ConfigurationError - Could not retrieve or read configuration (Configuration)

It appears the openHAB use of SpeedTest like speedtest-tracker does run into the second error. But I didn't see anything in that openHAB thread that would match the cannot read error?

One guy in that thread was getting the 'cannot read' error. Here's the direct link to his post. He seemed to have some success when changing the error handling to:

if (speedtestCliOutput.startsWith("{\"type\":") && speedtestCliOutput.endsWith("}}"))

henrywhitaker3 commented 3 years ago

Can one of you guys spin up a redis container and add the following env vars (you'll need to pull the newest dev image):

QUEUE_CONNECTION=redis
REDIS_HOST=127.0.0.1
REDIS_PASSWORD=null
REDIS_PORT=6379

Obviously may need to change those values so it actually points to the redis container. Then see if it hangs after a few tests again?

stevietv commented 3 years ago

got a redis container up and running and will let you know

eJonny commented 3 years ago

Thanks @stevietv !!!! looking forward to hearing how this turns out...

dra2885 commented 3 years ago

I've also got a redis container spun up. I've run ~5 manual tests over the last 10 minutes with no errors (previously, I'd usually run into a hung test). So far so good, but I'll report back tomorrow on whether it starts hanging during the course of the day.

ETA: Errors popping up again.

stevietv commented 3 years ago

been running the dev branch with a redis container connected now for the two days. no errors as yet. If there's any data I can provide you then let me know.

dra2885 commented 3 years ago

@stevietv can you share your configuration (e.g., maybe your docker-compose)? I'm still running into errors even with the redis container, but it's possible I haven't set it up correctly...

henrywhitaker3 commented 3 years ago

Haven't tried this, but something like this should work:

version: '3'
services:
    speedtest:
        container_name: speedtest
        image: henrywhitaker3/speedtest-tracker:dev
        ports:
            - 8765:80
        volumes:
            - /path/to/data:/config
        environment:
            - OOKLA_EULA_GDPR=true
            - QUEUE_CONNECTION=redis
            - REDIS_HOST=redis
            - REDIS_PASSWORD=null
            - REDIS_PORT=6379
        restart: unless-stopped
        networks:
            - speedtest
    redis:
        container_name: redis
        image: redis
        restart: unless-stopped
        networks:
            - speedtest
networks:
    speedtest:

stevietv commented 3 years ago

sure, here's my compose for those two services. (note, my network t2_proxy is defined elsewhere, but in essence is just a bridge):

version: "3.7"

  speedtest:
    container_name: speedtest
    image: henrywhitaker3/speedtest-tracker:dev
    restart: unless-stopped
    networks:
      - t2_proxy
    security_opt:
      - no-new-privileges:true
    ports:
      - $SPEEDTEST_PORT:80
    volumes:
      - $USERDIR/docker/speedtest/:/config
    environment:
      - TZ=$TZ
      - PGID=$PGID
      - PUID=$PUID
      - OOKLA_EULA_GDPR=true
      - QUEUE_CONNECTION=redis
      - REDIS_HOST=192.168.90.249
      - REDIS_PASSWORD=null
      - REDIS_PORT=6379
    logging:
      driver: "json-file"
      options:
        max-file: "10"
        max-size: "200k"

  redis:
    image: bitnami/redis:latest
    container_name: redis
    environment:
      - ALLOW_EMPTY_PASSWORD=yes
      - PUID=$PUID
      - PGID=$PGID
    networks:
      t2_proxy:
        ipv4_address: 192.168.90.249 
    ports:
      - "6379:6379"
    volumes:
      - $USERDIR/docker/redis:/bitnami/redis/data

dra2885 commented 3 years ago

Thanks for posting .. I have virtually an identical setup but I'm still running into the same issues. Weird!

version: '3'

services:
   speedtest:
     container_name: speedtest
     image: henrywhitaker3/speedtest-tracker:dev
     restart: unless-stopped
     ports:
       - 11987:80
     volumes:
       - /volume1/apps/configs/speedtestv2:/config:rw
       - /etc/localtime:/etc/localtime:ro
     environment:
       - PGID=101
       - PUID=1026
       - OOKLA_EULA_GDPR=true
       - QUEUE_CONNECTION=redis
       - REDIS_HOST=172.16.11.111
       - REDIS_PASSWORD=null
       - REDIS_PORT=6379
     logging:
       driver: "json-file"
       options:
         max-file: "10"
         max-size: "200k"
     labels:
       - com.centurylinklabs.watchtower.enable=true
     networks:
       - admin_internal
     links:
       - speedtest_redis

   speedtest_redis:
     container_name: speedtest_redis
     image: bitnami/redis:latest
     ports:
       - 6379:6379
     environment:
       - ALLOW_EMPTY_PASSWORD=yes
       - PGID=101
       - PUID=1026
     volumes:
       - /etc/localtime:/etc/localtime:ro     
     labels:
       - com.centurylinklabs.watchtower.enable=true
     networks:
       admin_internal:
         ipv4_address: 172.16.11.111

stevietv commented 3 years ago

following up on this, after almost a week running smoothly the error appeared in my log again

yesterday at 16:59:07 {"type":"log","timestamp":"2021-04-30T14:59:07Z","message":"Error: [11] Cannot read: Resource temporarily unavailable","level":"error"}

and no test is executed since.

dra2885 commented 3 years ago

Curious if anyone ever had any luck resolving this issue...

jiriteach commented 3 years ago

I’m not seeing this issue. Working fine for me. Just checked and speed tests running solid with no problems.

henrywhitaker3 commented 3 years ago

Going to close this as it's old - ping me if you are still seeing it

dra2885 commented 3 years ago

Yes, still an issue here. Given few others are able to re-create it, it may just be something unique to my setup on Docker + Synology...

lowne commented 3 years ago

@henrywhitaker3 it's still a problem. Either when the speedtest-cli process itself hangs for whatever reason, or, possibly, when it's so slow to finish that php times out or something - I say this because in my case the queue reliably hangs every evening when speedtests are <1Mpbs and connectivity is completely unreliable. I don't know anything about php, but after a quick look around I'd suggest to properly monitor the child pid (as opposed to blindly trust it terminates) here by using proc_open or similar, and/or ensure it's run in some async way and doesn't block an nginx thread it's not supposed to block, and/or adjust the timeouts somewhere so it doesn't end up stuck, or at least fail the test and ensure proper cleanup after the given timeout.

lowne commented 3 years ago

A way to "reproduce" the problem, I guess, is to replace speedtest -f json in the shellout with /bin/sleep 9999 and see if it still hangs the queue.

jamesonuk commented 2 years ago

Just to say that I am now experiencing what appears to be the same. This has been running for quite a while but is suddenly just stopping at some point during the day. I have a job to restart my container every day and it works again but sometimes it fails after a few runs. somethings after quite a lot of runs (I have it setup to run every 20 minutes).

I have deleted the mounted volume and pulled a new copy of the image and still the sane,

As with others I found a process that was just hanging /config/www/app/Bin/speedtest -f json and killing that it has started trying to run through the backlog of tests

In fact it failed on the 8:20 run. I killed that and it ran 13 more tests within about two minute and then the 14th test hung again.

I have just tried to run speedtest manually from inside the container and it popped up asking me to accept the license. It then failed with

Do you accept the license? [type YES to accept]: YES
License acceptance recorded. Continuing.

{"type":"log","timestamp":"2021-10-31T22:34:08Z","message":"Error: [110] Cannot open socket: Timeout occurred in connect.","level":"error"}
{"type":"log","timestamp":"2021-10-31T22:34:10Z","message":"Error: [11] Cannot read: Resource temporarily unavailable","level":"error"}
{"type":"log","timestamp":"2021-10-31T22:34:13Z","message":"Error: [11] Cannot read: Resource temporarily unavailable","level":"error"}
{"type":"log","timestamp":"2021-10-31T22:34:13Z","message":"Server Selection - Failed to find a working test server. (NoServers)","level":"error"}

but subsequent manual run returned valid JSON

jeritiana commented 2 years ago

I used ps to identify which task was hanging, based on my cron start of every 59 minutes and that my last sucessful test was at 17:59 it was easy to identify that the failed process was 3079.

root@39172993c1cf:/# ps -eo pid,cmd,lstart
    PID CMD                                          STARTED
      1 s6-svscan -t0 /var/run/s6/s Wed Apr 14 16:38:33 2021
     38 s6-supervise s6-fdholderd   Wed Apr 14 16:38:35 2021
    403 s6-supervise speedtest      Wed Apr 14 16:38:51 2021
    405 s6-supervise php-fpm        Wed Apr 14 16:38:51 2021
    406 s6-supervise nginx          Wed Apr 14 16:38:51 2021
    407 s6-supervise cron           Wed Apr 14 16:38:51 2021
    408 php-fpm: master process (/e Wed Apr 14 16:38:51 2021
    409 nginx: master process /usr/ Wed Apr 14 16:38:51 2021
    410 php /config/www/artisan que Wed Apr 14 16:38:51 2021
    412 /usr/sbin/crond -f -S -l 5  Wed Apr 14 16:38:51 2021
    433 php-fpm: pool www           Wed Apr 14 16:38:51 2021
    434 php-fpm: pool www           Wed Apr 14 16:38:51 2021
    435 nginx: worker process       Wed Apr 14 16:38:51 2021
    436 nginx: worker process       Wed Apr 14 16:38:51 2021
    437 nginx: worker process       Wed Apr 14 16:38:51 2021
    438 nginx: worker process       Wed Apr 14 16:38:51 2021
   1016 php-fpm: pool www           Wed Apr 14 16:53:07 2021
   3079 /config/www/app/Bin/speedte Wed Apr 14 18:59:01 2021
  14744 bash                        Thu Apr 15 08:33:22 2021
  15130 ps -eo pid,cmd,lstart       Thu Apr 15 08:39:24 2021
root@39172993c1cf:/# kill -9 3079

Killing that process caused finally a failed test to register and the entire queue of tests that should have happened overnight (plus the times i clicked the manual test button) to get worked through:

Well the issue described here happens to me from time to time, and killing the malicious process unlocks all previously scheduled tests + resumes normal behavior. Thanks @stevietv for the hint!

The current workaround I use: create a cron job on the OS and execute killall speedtest just before the regularly scheduled tests. Example: I do a speedtest every 10th minutes (0, 10, 20, etc.) so I kill any blocking speedtest process just right before, every *8th minutes (58, 8, 18, etc.).

Killing all speedtest processes won't stop the main purpose of speedtest tracker, as it is called under s6-supervise.

For those using a docker image like me: schedule docker exec -t speedtest killall speedtest.

@henrywhitaker3 Would it be relevant to reopen this issue as it is still very happening nowadays? (my docker image is built from currently latest tag v1.12.2).

henrywhitaker3 / Speedtest-Tracker

[BUG] speedtest-tracker now only works sporadically for me #512