Closed dustinbird closed 6 months ago
@dustinbird
That file should exist in the container. try:
docker exec <openvas container name> ls -l /var/run/ospd/
Or ... use interactive exec to enter the container:
docker exec -it <openvas container name> bash
Then navigate to /var/run/ospd as if you were on another Linux server.
BUT ... from what you describe, it sounds like ospd-openvas has died.
Could you attach the full contents of the log from the container ..
docker logs <container name> > container.log
Then attach the "container.log" file to this issue.
Also, I'm guessing if you check "docker ps", that the container does not show as healthy?
-Scott
Thank you for coming back to me on this.
Yes I can now see the file when I navigate that way, so it is in the docker container.
When I look at Docker PS, I can see it has a status of unhealthy. I think I have got into the habit of seeing it running lately and able to access the web interface so ignored that important piece of information.
The log you requested is as below, I hope this is correct for you:
cp: cannot stat '/usr/local/var/lib/': No such file or directory
chown: invalid user: ‘gvm:gvm’
cp: cannot stat '/var/lib/gvm/': No such file or directory
cp: cannot stat '/var/lib/notus/': No such file or directory
cp: cannot stat '/var/lib/openvas/': No such file or directory
cp: cannot stat '/etc/gvm/': No such file or directory
cp: cannot stat '/usr/local/etc/openvas/': No such file or directory
gpg: keybox '/etc/openvas-gnupg/pubring.kbx' created
gpg: /etc/openvas-gnupg/trustdb.gpg: trustdb created
gpg: key 9823FAA60ED1E580: public key "Greenbone Community Feed integrity key" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: inserting ownertrust of 6
Traceback (most recent call last):
File "/usr/local/bin/ospd-openvas", line 8, in
Can you please share how you are starting the container? Full command or the docker-compose.yml
Also, make sure there are no issues with the underlying storage. At first glance, that's all I can think it could be, though that usually shows as postgresql failing out because it can't write to the DB. For me, that has happened when the disc gets full, but have seen it through other issues as well.
Just noticed ... all the "cannot stat" errors, are to the persistent storage. Are you using a local directory or a docker volume for your persistent storage? Make sure, where ever it is, that there are no issues with the underlying discs.
-Scott
The command the container is started with is as follows:
docker run --detach --publish 8443:9392 -e PASSWORD="REDACTED" --volume openvas:/data -e HTTPS=true --name openvas immauss/openvas:latest
The system has the following space so this does not appear to be an issue and the device appears to be healthy. I could try running an old volume backup to see if this can point to whether or not it is a disk / DB issue.
Filesystem Size Used Avail Use% Mounted on devtmpfs 4.0M 8.0K 4.0M 1% /dev tmpfs 3.8G 4.0K 3.8G 1% /dev/shm tmpfs 1.6G 9.0M 1.6G 1% /run tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup /dev/vda2 48G 6.4G 42G 14% / /dev/vdb1 20G 14G 6.8G 67% /var/lib/docker overlay 20G 14G 6.8G 67% /var/lib/docker/overlay2/0c5e2c0ca570a75ec2fb7b7ea838a2a19dcff926f7fc139bad45d18eb6928b90/merged tmpfs 3.8G 0 3.8G 0% /var/lib/docker/containers/93415de6a729efae58bf0ed3181fd23914d71f8243416500d5f95b678f10be2c/mounts/secrets tmpfs 775M 40K 775M 1% /run/user/0
Is there an easy way to do a health check on the volume or more likely the DB?
It's not the DB. You aren't getting errors from postgresql.
You could fsck the disc that holds /var/lib/docker
-Scott
We have checked the disk and there was no errors shown and no space issues. I have rolled the mounted dfrive back to an old snapshot and this is now working as it should. Frustrating not to see an obvious issue that caused this but relieved it is working.
Thank you for the support as always.
I spoke too soon. The old working snapshot has once again given me the same issues.
I think we will build a new box from the ground up. I can see how to export to xml tasks, hosts etc but now how to import. Is there a simple article I can see about transferring the data from one to another?
Before you do that ....
Start by creating a new container. ( new name & new volume name)
docker run --detach --publish 8443:9392 -e PASSWORD="REDACTED" --volume openvas-new:/data -e HTTPS=true --name openvas-new immauss/openvas:latest
Then take a look at the docs
and follow the steps for database backup and restoral.
restore from the old container to the new.
Much easier than exporting everything to XML and then importing. This will also get all of you results from previous scans.
-Scott
thank you.
exactely the same issue here and i just do a restart of my compose :( Same errors with python 3.11 in container
same on a new box with this simple compose :
version: '3.8'
services:
openvas:
image: immauss/openvas
container_name: openvas
restart: unless-stopped
environment:
PASSWORD: REDACTED
HTTPS: false
volumes:
- /etc/localtime:/etc/localtime:ro
- ./data:/data
ports:
- 8080:9392
new dock, new bind volume
test with a new docker volume (instead of bind volume), same :'(
you're not alone @dustinbird :'(
@grandaor Can you give me the full scenario? How long has it been running?
When was the last time you updated the container?
Can you try the solution I suggested to @dustinbird?
Thanks, -Scott
Could you also send the results of the following command?
docker exec -it openvas bash -c "ls -l /data/var* /data/local*"
Thanks, -Scott
i've this issue today when i did a restart of my compose in prod (included traefik, etc...) this container was updated and worked on : — 30/04/2024 21:39.... until today so, i try on a new box with the simply compose above and fresh deploy (without data) and exactely same errors...
i noticed one thing :
root@4311fa403057:/var/run/ospd# ls -l
total 0
lrwxrwxrwx 1 root root 31 May 3 21:44 ospd.sock -> /var/run/ospd/ospd-openvas.sock
ospd.sock seem to be a symbolic link on file socket that doesn't appear in the same directory
openvas | md manage:WARNING:2024-05-03 22h05.47 CEST:1308: osp_scanner_feed_version: failed to connect to /var/run/ospd/ospd-openvas.sock
bob@marley:~/docker/openvas$ docker exec -it openvas bash -c "ls -l /data/var* /data/local*"
/data/local-etc:
total 8
drwxr-xr-x 2 root root 4096 May 3 21:42 gvm
drwxr-xr-x 3 root root 4096 May 3 21:44 openvas
/data/local-share:
total 36
drwxr-xr-x 2 root root 4096 Apr 24 21:15 ca-certificates
drwxr-xr-x 3 root root 4096 Apr 24 22:04 doc
drwxrwsr-x 2 root staff 4096 Apr 24 21:15 fonts
drwxr-xr-x 6 gvm gvm 4096 Apr 24 23:00 gvm
drwxr-xr-x 4 root root 4096 Apr 24 22:04 man
drwxr-xr-x 2 root root 4096 Apr 24 22:05 openvas
drwxr-xr-x 7 root root 4096 Apr 24 21:15 sgml
drwxr-xr-x 2 root root 4096 Apr 24 21:15 texmf
drwxr-xr-x 6 root root 4096 Apr 24 22:03 xml
/data/var-lib:
total 24
-rw-r--r-- 1 gvm gvm 11 May 3 21:43 FeedDate
drwxr-xr-x 8 gvm gvm 4096 Aug 13 2023 gvm
drwxr-xr-x 2 gvm gvm 4096 Aug 13 2023 mosquitto
drwxrwxr-x 4 gvm gvm 4096 May 3 21:43 notus
drwxr-xr-x 3 gvm gvm 4096 Aug 13 2023 openvas
-rw-r--r-- 1 gvm gvm 29 Apr 25 04:23 update.ts
/data/var-log:
total 32
-rw-r--r-- 1 postgres root 19193 May 3 21:43 db-restore.log
drwxr-xr-x 2 gvm gvm 4096 May 3 21:44 gvm
drwxr----- 2 mosquitto root 4096 May 3 21:42 mosquitto
drwxr-xr-x 2 postgres postgres 4096 May 3 21:42 postgresql
note: i destroyed container each time and this issue appear with the solution of @dustinbird , i think his container was not destroy when he restored his snapshot
You destroyed the container, but not your "./data"
I think I know what might be the problem, but need to test a few things ....
And the soft link is correct for the socket.
it was my 1st test to delete my data :'( and i test it on a new box (totally fresh) and same issue
what's your idea scott ? :angel:
If you get the same issue with a clean /data ... then my fist thought is not correct.
Strange ... I've tested on multiple machines and have indications of > 200 others using that version as well ....
but ... as I'm writing this, I just got the same error.
Now that I have reproduced it, I should be able to work it out.
Sorry ...
You might need to roll back to the previous version while I work this out. It's pretty late here and I have an early day tomorrow. So if I don't work this out really soon, it might a day or two before I can resolve it.
-Scott
NVM ... Found the issue. Will take me a few to fix it.
don't worry :)
you're already my heroe when you resolved : https://github.com/immauss/openvas/issues/234
have a good night :)
what is it ? i'm curious :)
cron to generate the notus files blew up. Your getting bad data from my rsync server.
Ugh .... OK ... if you restart the container now, it should pull good data, and work. Sorry about that .... That's the first time that cron has run, so naturally there was something unanticipated .... which I still need to figure out.
@grandaor and @dustinbird Thank you for bringing this to my attention.
i test it immediately :)
You rocks dude <3
openvas | md main:MESSAGE:2024-05-03 20h44.40 utc:826: Greenbone Vulnerability Manager version 23.5.2 (DB revision 255) openvas | md manage: INFO:2024-05-03 20h44.40 utc:826: Getting scanners. openvas | md main:MESSAGE:2024-05-03 20h44.43 utc:833: Greenbone Vulnerability Manager version 23.5.2 (DB revision 255) openvas | md manage: INFO:2024-05-03 20h44.43 utc:833: Verifying scanner. openvas | openvas | ==> /usr/local/var/log/gvm/healthchecks.log <== openvas | Healthchecks completed with no issues.
you are not to be sorry about that you are so prompt to resolve ! i'm loving you scott <3
@grandaor @dustinbird Thanks again for bringing this to my attention.
The problem was an issue with the cron which creates the new notus feeds for Rocky. This has been resolved. I've also added error checking to validate the newly created notus feeds BEFORE the existing working feeds are replaced. SO ... hopefully ... this won't happen again. (But I'm sure you'll let me know if it does. 😉)
Thanks, Scott
Thank yiou so much for sorting this. I was away for the weekend and when I logged in to look at your fix I found that a second box of ours had failed too, one week after the first. I have updated the docker image on both and now they seem to be working as they should.
Thank you so much for all the work on this.
Can we open this issue again please or have another look at this as since the latest docker image was applied to the servers fixing the interrupted at the start issue I now have some scans coming back as interrupted at 100% which I never had before. Has you or anyone encountered this since the latest image was applied.
We have a scan that takes 1 minutes or so where a scan box scans the other box to make sure that one is ok. but this is giving two error messages after taking an incredible amount of time rather than the expected 1 minute:
"Error exit, setting running task to Interrupted."
I have a new issue that has happened on one of our scanner boxes that has been working without errors up until this (while I was away). All the current scheduled scans (and when manually triggered) are failing with a status of interrupted. When I look at the task the only error message I see is "Cannot connect to scanner" but no detailed log.
As mentioned this was working until Friday when I took annual leave but seems to have started with scans failing from Saturday onwards. Last Thursday I updated the docker image to get the latest version and the same was run on another box the following day but this is unaffected. I have today updated the docker image again, I have ran all available servers patches and rebooted the server (OpenSUSE Leap 15.5).
When I look at the container log I see the following error:
osp_scanner_feed_version: failed to connect to /var/run/ospd/ospd-openvas.sock\n","stream":"stdout","time":"2024-05-02T12:55:27.127335941Z"}
this file does not exist on this server but nor does it on another scanning server set-up the same way running the same OpenSUSE build and GVM versions with no errors.
What can I do to correct this?