Open WesSec opened 1 year ago
For now i got it fixed by clearing the queue table and requeue some stuff, but would be interested in figuring out the why.
Hi!
Nice catch; I will take a look at it. Please, can you tell a little about the case - when it happens and how to reproduce it?
I binned the broken instance so I cannot investigate it at this time, but I'll try to reproduce and update once i know more:
What i had was several scopes, 1 of them had a domain which had a service where each user had some sort of subdomain, resulting in a huge list of resources (3k+). What I think what happened is that some tasks got stuck and new one's weren't added because of that. It happened between the step of asset/subdomain discovery and resouce discovery, as for some resources the ports were already scanned (default scan template)
I had a default docker install as posted in the ReadMe.
What i do have, is the app container routed over a wireguard connection via gluetun. I don't think it had any impact as the connection is pretty reliable, but just to be sure.
I wrote a small SQL query to monitor the queued and running tasks:
select q.updated_at, object_type, q.status , s2.`type` , s2.class as jobtype , r.name as resource , s.name as scope FROM queues q
JOIN resources r on q.object_id = r.id
join scope_entries se on se.id = r.scope_entry_id
JOIN scopes s on s.id = se.scope_id
join scanners s2 on s2.id = scanner_entry_id
where q.status in('running', 'queued')
as far as the logging goes, it would be awesome to have logging with different verbosity levels where debug would log entries as: pushed X to queue, result OK, or Error pushing X to queue
I'll update this ticket as i gather more info, cheers 🍺
Thank you very much for the explanation. Few questions in addition:
I will take a look at debugging levels and will add more explained output for push:queue
It can also fail if the DB connection is unstable, but it should not impact the system overall.
Is it happens often or from time to time?
had it in a few occasions already, mostly i just ditched the instance and spun op a new one. What i tried in this case was only deleting the app container (incl volume) and recreate it. After that it was still broken, I ran the init commands again but i could imagine it would break other stuff with appkeys etc. I found out after some fiddling it started to process removed scopes (in frontend, in db they still existed), but i don't remember what buttons i pressed for that to happen. I'll keep a log.
Is there any impact on the scanning process?
When it happens nothing gets pushed to queue anymore it seems, so it stops. Adding new scopes etc doesn't change it
I think i let it run in that state for over 10 hours, but i'm not certain, i'll monitor
It sounds really bad, I will take a look ASAP. I have a few instances running w/wo docker and everything is fine.
Hi,
I got another instance running into fails, it might have to do something with that one of my scanning domains hit some WAF and is registering all ports as tcpwrapped, which results in a huge amount of services
[20-Jan-2023 08:17:26] NOTICE: fpm is running, pid 9
[20-Jan-2023 08:17:26] NOTICE: ready to handle connections
2023-01-20 08:17:26 Running ['artisan' push:queue] .............. 500ms FAIL
⇂ '/usr/local/bin/php' 'artisan' push:queue > '/dev/null' 2>&1
These logs all seem old, but just to be complete:
tail -f *.log
==> discovery.log <==
INFO Processing jobs from the [discovery] queue.
2022-10-18 20:41:46 App\Jobs\Discovery\AmassJob ............ 2,340.93ms DONE
2022-10-18 20:41:48 App\Jobs\Commands\AmassCommand ......... 7,952.19ms DONE
2022-10-18 20:41:56 App\Jobs\Commands\SubfinderCommand ..... 8,774.79ms DONE
2022-10-18 20:42:05 App\Jobs\Commands\AssetCommand ........ 32,362.91ms DONE
2022-10-18 20:42:37 App\Jobs\Commands\AssetCommand ........ 39,792.42ms DONE
2022-10-18 20:43:18 App\Jobs\Commands\GauCommand
==> notify.log <==
2022-10-18 20:42:10 App\Listeners\ResourceNotification ....... 110.00ms DONE
2022-10-18 20:42:10 App\Listeners\ResourceNotification ....... 146.72ms DONE
2022-10-18 20:42:10 App\Listeners\ResourceNotification ....... 101.82ms DONE
2022-10-18 20:42:11 App\Listeners\ResourceNotification ....... 379.54ms DONE
2022-10-18 20:42:11 App\Listeners\ResourceNotification ....... 121.85ms DONE
2022-10-18 20:42:11 App\Listeners\ResourceNotification ....... 128.76ms DONE
2022-10-18 20:42:12 App\Listeners\ResourceNotification ....... 157.16ms DONE
2022-10-18 20:42:12 App\Listeners\ResourceNotification ....... 335.85ms DONE
2022-10-18 20:44:50 App\Listeners\ResourceNotification .... 28,357.67ms DONE
2022-10-18 20:45:19 App\Listeners\ResourceNotification ....... 374.56ms DONE
==> resource.log <==
INFO Processing jobs from the [resource] queue.
2022-10-18 20:44:00 App\Jobs\Resource\NucleiMediumSc . . .. 97,388.59ms FAIL
2022-10-18 20:45:19 App\Jobs\Resource\NmapResourceScan .... 69,798.44ms DONE
2022-10-18 20:43:53 App\Jobs\Resource\NucleiCriticalScan .. 84,891.43ms FAIL
I pulled the latest code yesterday. U can hmu on discord at wessec_#6309 or if you have another channel to chat i'm happy to help debug
@WesSec I sent you a request via Discord.
Hi, been running/testing this for a while now, again, tools is great, thanks for the invested time
Sometimes I observe that the stack crashes, in the logs I observe the following:
When I run the push:queue manually I get the following result:
It does not return an error so It is hard to debug this. In the queues table i see no tasks on 'running' as expected. What would you suggest to debug this? I rebuilt the container and reconnected it to the DB but the issue still persists : )
Also the log files are empty (resource.log etc)