zzzteph / sheye

Opensource assets and vulnerability scanning tool
https://shrewdeye.app
146 stars 36 forks source link

[Request] Debug Logging #4

Open WesSec opened 1 year ago

WesSec commented 1 year ago

Hi, been running/testing this for a while now, again, tools is great, thanks for the invested time

Sometimes I observe that the stack crashes, in the logs I observe the following:

[22-Dec-2022 09:19:07] NOTICE: ready to handle connections

  2022-12-22 09:19:07 Running ['artisan' push:queue] .............. 536ms **FAIL**
  ⇂ '/usr/local/bin/php' 'artisan' push:queue > '/dev/null' 2>&1

When I run the push:queue manually I get the following result:

docker-compose exec app php artisan push:queue
USER:1:10
Maximum:10

It does not return an error so It is hard to debug this. In the queues table i see no tasks on 'running' as expected. What would you suggest to debug this? I rebuilt the container and reconnected it to the DB but the issue still persists : )

Also the log files are empty (resource.log etc)

WesSec commented 1 year ago

For now i got it fixed by clearing the queue table and requeue some stuff, but would be interested in figuring out the why.

zzzteph commented 1 year ago

Hi!

Nice catch; I will take a look at it. Please, can you tell a little about the case - when it happens and how to reproduce it?

WesSec commented 1 year ago

I binned the broken instance so I cannot investigate it at this time, but I'll try to reproduce and update once i know more:

What i had was several scopes, 1 of them had a domain which had a service where each user had some sort of subdomain, resulting in a huge list of resources (3k+). What I think what happened is that some tasks got stuck and new one's weren't added because of that. It happened between the step of asset/subdomain discovery and resouce discovery, as for some resources the ports were already scanned (default scan template)

I had a default docker install as posted in the ReadMe.

What i do have, is the app container routed over a wireguard connection via gluetun. I don't think it had any impact as the connection is pretty reliable, but just to be sure.

I wrote a small SQL query to monitor the queued and running tasks:

select q.updated_at, object_type, q.status , s2.`type` , s2.class as jobtype  , r.name as resource , s.name as scope  FROM queues q
JOIN resources r on q.object_id = r.id 
join scope_entries se on se.id = r.scope_entry_id 
JOIN scopes s on s.id = se.scope_id 
join scanners s2 on s2.id = scanner_entry_id 
where q.status in('running', 'queued')

as far as the logging goes, it would be awesome to have logging with different verbosity levels where debug would log entries as: pushed X to queue, result OK, or Error pushing X to queue .

I'll update this ticket as i gather more info, cheers 🍺

zzzteph commented 1 year ago

Thank you very much for the explanation. Few questions in addition:

  1. If the task is in status "running," it will be terminated in two hours.
  2. If a task is in status "queued," the logic becomes more complicated - it calculates how many tasks can be run before the task and multiply this time by 1 hour. So if the task is in the status "queued" it will be terminated after 10 hours.

I will take a look at debugging levels and will add more explained output for push:queue

It can also fail if the DB connection is unstable, but it should not impact the system overall.

WesSec commented 1 year ago

Is it happens often or from time to time?

had it in a few occasions already, mostly i just ditched the instance and spun op a new one. What i tried in this case was only deleting the app container (incl volume) and recreate it. After that it was still broken, I ran the init commands again but i could imagine it would break other stuff with appkeys etc. I found out after some fiddling it started to process removed scopes (in frontend, in db they still existed), but i don't remember what buttons i pressed for that to happen. I'll keep a log.

Is there any impact on the scanning process?

When it happens nothing gets pushed to queue anymore it seems, so it stops. Adding new scopes etc doesn't change it

I think i let it run in that state for over 10 hours, but i'm not certain, i'll monitor

zzzteph commented 1 year ago

It sounds really bad, I will take a look ASAP. I have a few instances running w/wo docker and everything is fine.

WesSec commented 1 year ago

Hi,

I got another instance running into fails, it might have to do something with that one of my scanning domains hit some WAF and is registering all ports as tcpwrapped, which results in a huge amount of services

image

[20-Jan-2023 08:17:26] NOTICE: fpm is running, pid 9
[20-Jan-2023 08:17:26] NOTICE: ready to handle connections

  2023-01-20 08:17:26 Running ['artisan' push:queue] .............. 500ms FAIL
  ⇂ '/usr/local/bin/php' 'artisan' push:queue > '/dev/null' 2>&1

These logs all seem old, but just to be complete:

tail -f *.log
==> discovery.log <==

   INFO  Processing jobs from the [discovery] queue.

  2022-10-18 20:41:46 App\Jobs\Discovery\AmassJob ............ 2,340.93ms DONE
  2022-10-18 20:41:48 App\Jobs\Commands\AmassCommand ......... 7,952.19ms DONE
  2022-10-18 20:41:56 App\Jobs\Commands\SubfinderCommand ..... 8,774.79ms DONE
  2022-10-18 20:42:05 App\Jobs\Commands\AssetCommand ........ 32,362.91ms DONE
  2022-10-18 20:42:37 App\Jobs\Commands\AssetCommand ........ 39,792.42ms DONE
  2022-10-18 20:43:18 App\Jobs\Commands\GauCommand
==> notify.log <==
  2022-10-18 20:42:10 App\Listeners\ResourceNotification ....... 110.00ms DONE
  2022-10-18 20:42:10 App\Listeners\ResourceNotification ....... 146.72ms DONE
  2022-10-18 20:42:10 App\Listeners\ResourceNotification ....... 101.82ms DONE
  2022-10-18 20:42:11 App\Listeners\ResourceNotification ....... 379.54ms DONE
  2022-10-18 20:42:11 App\Listeners\ResourceNotification ....... 121.85ms DONE
  2022-10-18 20:42:11 App\Listeners\ResourceNotification ....... 128.76ms DONE
  2022-10-18 20:42:12 App\Listeners\ResourceNotification ....... 157.16ms DONE
  2022-10-18 20:42:12 App\Listeners\ResourceNotification ....... 335.85ms DONE
  2022-10-18 20:44:50 App\Listeners\ResourceNotification .... 28,357.67ms DONE
  2022-10-18 20:45:19 App\Listeners\ResourceNotification ....... 374.56ms DONE

==> resource.log <==

   INFO  Processing jobs from the [resource] queue.

  2022-10-18 20:44:00 App\Jobs\Resource\NucleiMediumSc . . .. 97,388.59ms FAIL
  2022-10-18 20:45:19 App\Jobs\Resource\NmapResourceScan .... 69,798.44ms DONE
  2022-10-18 20:43:53 App\Jobs\Resource\NucleiCriticalScan .. 84,891.43ms FAIL

I pulled the latest code yesterday. U can hmu on discord at wessec_#6309 or if you have another channel to chat i'm happy to help debug

zzzteph commented 1 year ago

@WesSec I sent you a request via Discord.