Closed bcurran3 closed 3 years ago
Having the same issue, I find that bulk deleting the data tends to make it work again.
Hmm, strange. When this happens again could you guys save your sqlite DBs before deleting stuff? Is in /config/www/database/
Also, is there anything relevant in /config/www/storage/logs/laravel.log
?
Haven't cleared my database. My /config/www/storage/logs/laravel.log has nothing after 2020-11-29.
Current status: working but flakey. Working so far today (ignore the two hour Internet outage) and yesterday, but not so much the two days before. No reboots or any changes that I'm aware of.
I've got some errors in the logs. DNS resolving errors most likely during my 2 hour outage, no clue about the ConfigurationError messages.
nginx: [alert] detected a LuaJIT version which is not OpenResty's; many optimizations will be disabled and performance will be compromised (see https://github.com/openresty/luajit2 for OpenResty's LuaJIT or, even better, consider using the OpenResty releases from https://openresty.org/en/download.html)
{"type":"log","timestamp":"2021-03-04T03:00:10Z","message":"Error: [11] Cannot read from socket: Resource temporarily unavailable","level":"error"}
{"type":"log","timestamp":"2021-03-04T05:00:13Z","message":"Error: [11] Cannot read: Resource temporarily unavailable","level":"error"}
[2021-03-04 01:00:07.176] [error] Trying to get interface information on non-initialized socket.
[2021-03-04 01:00:12.689] [error] Configuration - Couldn't resolve host name (HostNotFoundException)
[2021-03-04 01:00:12.689] [error] Configuration - Cannot retrieve configuration document (0)
[2021-03-04 01:00:12.689] [error] ConfigurationError - Could not retrieve or read configuration (Configuration)
[2021-03-04 01:00:12.689] [error] ConfigurationError - Could not retrieve or read configuration (Configuration)
{"type":"log","timestamp":"2021-03-04T09:00:12Z","message":"Configuration - Could not retrieve or read configuration (ConfigurationError)","level":"error"}
[2021-03-04 02:00:06.653] [error] Trying to get interface information on non-initialized socket.
[2021-03-04 02:00:12.166] [error] Configuration - Couldn't resolve host name (HostNotFoundException)
[2021-03-04 02:00:12.166] [error] Configuration - Cannot retrieve configuration document (0)
[2021-03-04 02:00:12.166] [error] ConfigurationError - Could not retrieve or read configuration (Configuration)
[2021-03-04 02:00:12.166] [error] ConfigurationError - Could not retrieve or read configuration (Configuration)
{"type":"log","timestamp":"2021-03-04T10:00:12Z","message":"Configuration - Could not retrieve or read configuration (ConfigurationError)","level":"error"}
Cleared the database again around midnight last night without restoring. Failed again at 9:03am this morning and hasn't run another test since. This is my entire log for 2021-03-03:
[2021-03-03 04:18:27] local.ERROR: Undefined index: ping
[2021-03-03 14:03:22] local.ERROR: Undefined index: ping
I have attached the second log I have (2021-03-01) to this comment. laravel-2021-03-01.txt
There anything in /config/log/speedtest/queue.log
or same for schedule.log
? Might be worth deleting the volume and images then pull the images again and start fresh, looks like some dependencies aren't there
Linking this up with #461 which is the same issue. I also don't have a schedule.log
file.
Yeah probably should be cron.log, not sure why it's hanging though. Pretty tough with stuff like this, can't be that much help without being able to replicate it
I just did a complete data wipe and container re-build. Tests started failing on the third manual test.
The contents of cron.log
don't show anything unusual:
Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1 Running scheduled command: '/usr/bin/php7' 'artisan' speedtest:clear-sessions > '/dev/null' 2>&1
Similarly with queue.log
though it obviously doesn't show a corresponding "processed" entry for the test that is hung.
[2021-03-06 20:36:51][1] Processing: App\Jobs\SpeedtestJob [2021-03-06 20:37:04][1] Processed: App\Jobs\SpeedtestJob [2021-03-06 20:39:10][2] Processing: App\Jobs\SpeedtestJob [2021-03-06 20:39:32][2] Processed: App\Jobs\SpeedtestJob [2021-03-06 20:47:26][3] Processing: App\Jobs\SpeedtestJob
Is there a setting to populate the logs with more detail?
my speedtest.cron.log:
The failed job [6] has been pushed back onto the queue!
The failed job [5] has been pushed back onto the queue!
The failed job [4] has been pushed back onto the queue!
The failed job [3] has been pushed back onto the queue!
The failed job [2] has been pushed back onto the queue!
The failed job [1] has been pushed back onto the queue!
There anything in
/config/log/speedtest/queue.log
or same forschedule.log
? Might be worth deleting the volume and images then pull the images again and start fresh, looks like some dependencies aren't there
Finally had time to test this. Deleted the old images and containers, repulled, scheduled to run at the same frequency as before. Didn't even last 24 hours this time. All error logs are the same as before.
Sorry it's been so long, I'll have a look into this in more detail today or tomorrow. Might not get far as I haven't been able to reproduce it so far
Sorry it's been so long, I'll have a look into this in more detail today or tomorrow. Might not get far as I haven't been able to reproduce it so far
Hi @henrywhitaker3 , First, let me express gratitude for the fantastic app! It's very useful to show my partner that the problem is her VPN not the home WiFi! You have saved many arguments.
I am having the same "pause" issue others have reported. I'm running on QNAP QTS.
Next time this happens to one of you, could you get a shell in the container and run php /config/www artisan queue:restart
then give it 10 or seconds and queue a test again through the UI and let me know if that job works
Recreated the container today, reproduced the bug by running a couple of manual tests, and then tried executed the command you requested in a shell inside the container. Didn't have any luck successfully running a manual test after executing the command .. tried the process a few different times, as well.
Poking around more on this .. may be obvious, but I guess this is an error with the speedtest package. Some interesting reading here. I'm not savvy enough with my skills yet to figure out how to play around with this .. but thought I'd share in case it's helpful.
@dra2885 Nice research on this... based on my reading of this maybe there are two issues:
Error: [11] Cannot read: Resource temporarily unavailable
and
[error] ConfigurationError - Could not retrieve or read configuration (Configuration)
It appears the openHAB use of SpeedTest like speedtest-tracker does run into the second error. But I didn't see anything in that openHAB thread that would match the cannot read error?
run
php /config/www artisan queue:restart
then give it 10 or seconds and queue a test again through the UI and let me know if that job works
Hey, I tried this and it didn't produce any positive result.
I used ps to identify which task was hanging, based on my cron start of every 59 minutes and that my last sucessful test was at 17:59 it was easy to identify that the failed process was 3079.
root@39172993c1cf:/# ps -eo pid,cmd,lstart
PID CMD STARTED
1 s6-svscan -t0 /var/run/s6/s Wed Apr 14 16:38:33 2021
38 s6-supervise s6-fdholderd Wed Apr 14 16:38:35 2021
403 s6-supervise speedtest Wed Apr 14 16:38:51 2021
405 s6-supervise php-fpm Wed Apr 14 16:38:51 2021
406 s6-supervise nginx Wed Apr 14 16:38:51 2021
407 s6-supervise cron Wed Apr 14 16:38:51 2021
408 php-fpm: master process (/e Wed Apr 14 16:38:51 2021
409 nginx: master process /usr/ Wed Apr 14 16:38:51 2021
410 php /config/www/artisan que Wed Apr 14 16:38:51 2021
412 /usr/sbin/crond -f -S -l 5 Wed Apr 14 16:38:51 2021
433 php-fpm: pool www Wed Apr 14 16:38:51 2021
434 php-fpm: pool www Wed Apr 14 16:38:51 2021
435 nginx: worker process Wed Apr 14 16:38:51 2021
436 nginx: worker process Wed Apr 14 16:38:51 2021
437 nginx: worker process Wed Apr 14 16:38:51 2021
438 nginx: worker process Wed Apr 14 16:38:51 2021
1016 php-fpm: pool www Wed Apr 14 16:53:07 2021
3079 /config/www/app/Bin/speedte Wed Apr 14 18:59:01 2021
14744 bash Thu Apr 15 08:33:22 2021
15130 ps -eo pid,cmd,lstart Thu Apr 15 08:39:24 2021
root@39172993c1cf:/# kill -9 3079
Killing that process caused finally a failed test to register and the entire queue of tests that should have happened overnight (plus the times i clicked the manual test button) to get worked through:
it seems like there would need to be some handling of the error - when this error is entered into the log: {"type":"log","timestamp":"2021-04-15T06:42:48Z","message":"Error: [11] Cannot read: Resource temporarily unavailable","level":"error"} then the process should be killed to ensure the queue doesn't hang
@dra2885 Nice research on this... based on my reading of this maybe there are two issues:
Error: [11] Cannot read: Resource temporarily unavailable
and
[error] ConfigurationError - Could not retrieve or read configuration (Configuration)
It appears the openHAB use of SpeedTest like speedtest-tracker does run into the second error. But I didn't see anything in that openHAB thread that would match the cannot read error?
One guy in that thread was getting the 'cannot read' error. Here's the direct link to his post. He seemed to have some success when changing the error handling to:
if (speedtestCliOutput.startsWith("{\"type\":") && speedtestCliOutput.endsWith("}}"))
Can one of you guys spin up a redis container and add the following env vars (you'll need to pull the newest dev
image):
QUEUE_CONNECTION=redis
REDIS_HOST=127.0.0.1
REDIS_PASSWORD=null
REDIS_PORT=6379
Obviously may need to change those values so it actually points to the redis container. Then see if it hangs after a few tests again?
got a redis container up and running and will let you know
Thanks @stevietv !!!! looking forward to hearing how this turns out...
I've also got a redis container spun up. I've run ~5 manual tests over the last 10 minutes with no errors (previously, I'd usually run into a hung test). So far so good, but I'll report back tomorrow on whether it starts hanging during the course of the day.
ETA: Errors popping up again.
been running the dev branch with a redis container connected now for the two days. no errors as yet. If there's any data I can provide you then let me know.
@stevietv can you share your configuration (e.g., maybe your docker-compose)? I'm still running into errors even with the redis container, but it's possible I haven't set it up correctly...
Haven't tried this, but something like this should work:
version: '3'
services:
speedtest:
container_name: speedtest
image: henrywhitaker3/speedtest-tracker:dev
ports:
- 8765:80
volumes:
- /path/to/data:/config
environment:
- OOKLA_EULA_GDPR=true
- QUEUE_CONNECTION=redis
- REDIS_HOST=redis
- REDIS_PASSWORD=null
- REDIS_PORT=6379
restart: unless-stopped
networks:
- speedtest
redis:
container_name: redis
image: redis
restart: unless-stopped
networks:
- speedtest
networks:
speedtest:
sure, here's my compose for those two services. (note, my network t2_proxy
is defined elsewhere, but in essence is just a bridge):
version: "3.7"
speedtest:
container_name: speedtest
image: henrywhitaker3/speedtest-tracker:dev
restart: unless-stopped
networks:
- t2_proxy
security_opt:
- no-new-privileges:true
ports:
- $SPEEDTEST_PORT:80
volumes:
- $USERDIR/docker/speedtest/:/config
environment:
- TZ=$TZ
- PGID=$PGID
- PUID=$PUID
- OOKLA_EULA_GDPR=true
- QUEUE_CONNECTION=redis
- REDIS_HOST=192.168.90.249
- REDIS_PASSWORD=null
- REDIS_PORT=6379
logging:
driver: "json-file"
options:
max-file: "10"
max-size: "200k"
redis:
image: bitnami/redis:latest
container_name: redis
environment:
- ALLOW_EMPTY_PASSWORD=yes
- PUID=$PUID
- PGID=$PGID
networks:
t2_proxy:
ipv4_address: 192.168.90.249
ports:
- "6379:6379"
volumes:
- $USERDIR/docker/redis:/bitnami/redis/data
Thanks for posting .. I have virtually an identical setup but I'm still running into the same issues. Weird!
version: '3'
services:
speedtest:
container_name: speedtest
image: henrywhitaker3/speedtest-tracker:dev
restart: unless-stopped
ports:
- 11987:80
volumes:
- /volume1/apps/configs/speedtestv2:/config:rw
- /etc/localtime:/etc/localtime:ro
environment:
- PGID=101
- PUID=1026
- OOKLA_EULA_GDPR=true
- QUEUE_CONNECTION=redis
- REDIS_HOST=172.16.11.111
- REDIS_PASSWORD=null
- REDIS_PORT=6379
logging:
driver: "json-file"
options:
max-file: "10"
max-size: "200k"
labels:
- com.centurylinklabs.watchtower.enable=true
networks:
- admin_internal
links:
- speedtest_redis
speedtest_redis:
container_name: speedtest_redis
image: bitnami/redis:latest
ports:
- 6379:6379
environment:
- ALLOW_EMPTY_PASSWORD=yes
- PGID=101
- PUID=1026
volumes:
- /etc/localtime:/etc/localtime:ro
labels:
- com.centurylinklabs.watchtower.enable=true
networks:
admin_internal:
ipv4_address: 172.16.11.111
following up on this, after almost a week running smoothly the error appeared in my log again
yesterday at 16:59:07 {"type":"log","timestamp":"2021-04-30T14:59:07Z","message":"Error: [11] Cannot read: Resource temporarily unavailable","level":"error"}
and no test is executed since.
Curious if anyone ever had any luck resolving this issue...
I’m not seeing this issue. Working fine for me. Just checked and speed tests running solid with no problems.
Going to close this as it's old - ping me if you are still seeing it
Yes, still an issue here. Given few others are able to re-create it, it may just be something unique to my setup on Docker + Synology...
@henrywhitaker3 it's still a problem.
Either when the speedtest-cli
process itself hangs for whatever reason, or, possibly, when it's so slow to finish that php times out or something - I say this because in my case the queue reliably hangs every evening when speedtests are <1Mpbs and connectivity is completely unreliable.
I don't know anything about php, but after a quick look around I'd suggest to properly monitor the child pid (as opposed to blindly trust it terminates) here by using proc_open or similar, and/or ensure it's run in some async way and doesn't block an nginx thread it's not supposed to block, and/or adjust the timeouts somewhere so it doesn't end up stuck, or at least fail the test and ensure proper cleanup after the given timeout.
A way to "reproduce" the problem, I guess, is to replace speedtest -f json
in the shellout with /bin/sleep 9999
and see if it still hangs the queue.
Just to say that I am now experiencing what appears to be the same. This has been running for quite a while but is suddenly just stopping at some point during the day. I have a job to restart my container every day and it works again but sometimes it fails after a few runs. somethings after quite a lot of runs (I have it setup to run every 20 minutes).
I have deleted the mounted volume and pulled a new copy of the image and still the sane,
As with others I found a process that was just hanging
/config/www/app/Bin/speedtest -f json
and killing that it has started trying to run through the backlog of tests
In fact it failed on the 8:20 run. I killed that and it ran 13 more tests within about two minute and then the 14th test hung again.
I have just tried to run speedtest manually from inside the container and it popped up asking me to accept the license. It then failed with
Do you accept the license? [type YES to accept]: YES
License acceptance recorded. Continuing.
{"type":"log","timestamp":"2021-10-31T22:34:08Z","message":"Error: [110] Cannot open socket: Timeout occurred in connect.","level":"error"}
{"type":"log","timestamp":"2021-10-31T22:34:10Z","message":"Error: [11] Cannot read: Resource temporarily unavailable","level":"error"}
{"type":"log","timestamp":"2021-10-31T22:34:13Z","message":"Error: [11] Cannot read: Resource temporarily unavailable","level":"error"}
{"type":"log","timestamp":"2021-10-31T22:34:13Z","message":"Server Selection - Failed to find a working test server. (NoServers)","level":"error"}
but subsequent manual run returned valid JSON
I used ps to identify which task was hanging, based on my cron start of every 59 minutes and that my last sucessful test was at 17:59 it was easy to identify that the failed process was 3079.
root@39172993c1cf:/# ps -eo pid,cmd,lstart PID CMD STARTED 1 s6-svscan -t0 /var/run/s6/s Wed Apr 14 16:38:33 2021 38 s6-supervise s6-fdholderd Wed Apr 14 16:38:35 2021 403 s6-supervise speedtest Wed Apr 14 16:38:51 2021 405 s6-supervise php-fpm Wed Apr 14 16:38:51 2021 406 s6-supervise nginx Wed Apr 14 16:38:51 2021 407 s6-supervise cron Wed Apr 14 16:38:51 2021 408 php-fpm: master process (/e Wed Apr 14 16:38:51 2021 409 nginx: master process /usr/ Wed Apr 14 16:38:51 2021 410 php /config/www/artisan que Wed Apr 14 16:38:51 2021 412 /usr/sbin/crond -f -S -l 5 Wed Apr 14 16:38:51 2021 433 php-fpm: pool www Wed Apr 14 16:38:51 2021 434 php-fpm: pool www Wed Apr 14 16:38:51 2021 435 nginx: worker process Wed Apr 14 16:38:51 2021 436 nginx: worker process Wed Apr 14 16:38:51 2021 437 nginx: worker process Wed Apr 14 16:38:51 2021 438 nginx: worker process Wed Apr 14 16:38:51 2021 1016 php-fpm: pool www Wed Apr 14 16:53:07 2021 3079 /config/www/app/Bin/speedte Wed Apr 14 18:59:01 2021 14744 bash Thu Apr 15 08:33:22 2021 15130 ps -eo pid,cmd,lstart Thu Apr 15 08:39:24 2021 root@39172993c1cf:/# kill -9 3079
Killing that process caused finally a failed test to register and the entire queue of tests that should have happened overnight (plus the times i clicked the manual test button) to get worked through:
Well the issue described here happens to me from time to time, and killing the malicious process unlocks all previously scheduled tests + resumes normal behavior. Thanks @stevietv for the hint!
The current workaround I use: create a cron job on the OS and execute killall speedtest
just before the regularly scheduled tests. Example: I do a speedtest every 10th minutes (0, 10, 20, etc.) so I kill any blocking speedtest process just right before, every *8th minutes (58, 8, 18, etc.).
Killing all speedtest processes won't stop the main purpose of speedtest tracker, as it is called under s6-supervise
.
For those using a docker image like me: schedule docker exec -t speedtest killall speedtest
.
@henrywhitaker3 Would it be relevant to reopen this issue as it is still very happening nowadays? (my docker image is built from currently latest tag v1.12.2).
"MOVED FROM:" https://github.com/henrywhitaker3/Speedtest-Tracker/issues/495
Describe the bug Automatic and manual testing yields no results.
To Reproduce Just running speedtest-tracker in my enviornment.
Expected behavior Multiple speedtest-tracker results daily.
Screenshots s/b
Context
All tests stopped 02/15 and wouldn't update even when I manually told it to for almost two weeks. I may or might not have rebooted the server in that time frame, probably not, but I bounced the container two days ago and it started working again.
With that said, I think I've got the problem again as testing stopped again yesterday. :(
I'll be happy to provide any info to help solve the problem.
Found this in the logs:
I'm not sure who/what geolite.maxmind.com is but I tried to go to the website and it's down, pinged it and got the same result. maxmind.com doesn't exist at the moment. I'm guessing this is the problem or part of it.