infobyte / faraday

Open Source Vulnerability Management Platform
https://www.faradaysec.com
GNU General Public License v3.0
4.78k stars 885 forks source link

File import is too slow! #418

Open gister9000 opened 3 years ago

gister9000 commented 3 years ago

What's the problem this feature will solve? When I import 140MB nessus scan into faraday, import process lasts more than 5 days (I canceled it after 5 days - time is probably much longer). Test was done on 12GB RAM machine, AMD FX(tm)-8350 Eight-Core Processor (eight 4.0 GHz cores!) etc. Not a slow machine so to speak. Test was performed using CLI.

Describe the solution you'd like I'd like the import process to be optimized - up to 5 hours would be usable I guess. This is not usable!

Additional context Check python3 profilers to discover what the bottleneck is - https://docs.python.org/3/library/profile.html

P.S. test file is not attached because it is highly confidential. If you have troubles with reproducing this issue, I'm willing to create unconfidential nessus scan report for you guys - just ask.

GL HF

aenima-x commented 3 years ago

With CLI what do you mean? We have a cli (https://github.com/infobyte/faraday-cli) and a Client (https://github.com/infobyte/faraday-client)

Here you have a test with a 120MB Nessus report.

With the cli, it took 10.83 seconds

(faraday-cli) ➜  Nessus (master) ✗ faraday-cli workspace
name    active    public    readonly      hosts    services    vulns
------  --------  --------  ----------  -------  ----------  -------
nessus  True      False     False             0           0        0
(faraday-cli) ➜  Nessus (master) ✗ time faraday-cli report ./sample_nessus.nessus
Sending data from [./sample_nessus.nessus] to workspace: nessus
faraday-cli report ./sample_nessus.nessus  10.83s user 1.03s system 33% cpu 35.848 total
(faraday-cli) ➜  Nessus (master) ✗ faraday-cli workspace
name    active    public    readonly      hosts    services    vulns
------  --------  --------  ----------  -------  ----------  -------
nessus  True      False     False             7          86      295
(faraday-cli) ➜  Nessus (master) ✗ ls -lh ./sample_nessus.nessus
-rw-r--r--  1 aenima  admin   120M Jun 16 12:41 ./sample_nessus.nessus
(faraday-cli) ➜  Nessus (master) ✗ faraday-cli workspace
name    active    public    readonly      hosts    services    vulns
------  --------  --------  ----------  -------  ----------  -------
nessus  True      False     False             7          86      295

With the web it took 55 seconds (including the upload of a 120MB file)

2020-10-20T14:00:40-0300 - faraday.server.api.modules.upload_reports - INFO {PoolThread-twisted.internet.reactor-6} [pid:34282] [upload_reports.py:44 - file_upload()]  Importing new plugin report in server...
2020-10-20T14:00:49-0300 - faraday.server.api.modules.upload_reports - INFO {PoolThread-twisted.internet.reactor-6} [pid:34282] [upload_reports.py:80 - file_upload()]  Get plugin for file: /Volumes/HDD/aenima/.faraday/uploaded_reports/L2C9D7V5OH48_sample_nessus.nessus
2020-10-20T14:00:49-0300 - faraday.server.api.modules.upload_reports - INFO {PoolThread-twisted.internet.reactor-6} [pid:34282] [upload_reports.py:86 - file_upload()]  Plugin for file: /Volumes/HDD/aenima/.faraday/uploaded_reports/L2C9D7V5OH48_sample_nessus.nessus Plugin: Nessus
2020-10-20T14:00:49-0300 - faraday.server.threads.reports_processor - INFO {ReportsManager-Thread} [pid:34282] [reports_processor.py:87 - run()]  Processing raw report /Volumes/HDD/aenima/.faraday/uploaded_reports/L2C9D7V5OH48_sample_nessus.nessus
2020-10-20T14:00:49-0300 - faraday.server.threads.reports_processor - INFO {ReportsManager-Thread} [pid:34282] [reports_processor.py:58 - process_report()]  Processing report [/Volumes/HDD/aenima/.faraday/uploaded_reports/L2C9D7V5OH48_sample_nessus.nessus] with plugin [Nessus
2020-10-20T14:01:14-0300 - faraday.server.threads.reports_processor - INFO {ReportsManager-Thread} [pid:34282] [reports_processor.py:38 - send_report_request()]  Send Report data to workspace [web]
2020-10-20T14:01:35-0300 - faraday.server.threads.reports_processor - INFO {ReportsManager-Thread} [pid:34282] [reports_processor.py:71 - process_report()]  Report processing finished

With the Client it took 32 seconds.

2020-10-20T14:02:47-0300 - faraday_client.managers.reports_managers - INFO {MainThread} [reports_managers.py:99 - sendReport()]  The file is /Volumes/HDD/aenima/Documents/Faraday/report-collection/faraday_plugins_tests/Nessus/sample_nessus.nessus, nessus
2020-10-20T14:02:47-0300 - faraday_client.plugins.controller - INFO {MainThread} [controller.py:256 - processReport()]  Processing report with plugin nessus
2020-10-20T14:03:19-0300 - faraday_client.plugins.controller - INFO {MainThread} [controller.py:139 - processOutput()]  Sent command duration 200

I don't know how many vulns, hosts and services are in your 140MB report file.

Maybe you are using an old version of the cliente witch uses an old API that is slower.

Or and old version of faraday witch had a problem like this.

Can you tell us witch version are you using?

gister9000 commented 3 years ago

Faraday version 3.12.

How do you have 120MB and 300 vulnerabilities (about 400KB/vuln)??

I had over 100 000 vulnerabilities in 140MB scan and about 750 hosts. My colleague did the test - he will probably continue this discussion and tell you exactly how many hosts, vulns and services there are.

gister9000 commented 3 years ago

When the import was canceled, there were 77000 vulnerabilities loaded into faraday. 5 days = 7200 minutes which approximates to little above 10 vulnerabilities per minute speed.

We are using a workaround now, maybe someone will find it useful - https://github.com/patriknordlen/nessusedit Nessusedit repo allows you to easily strip the scan from all informational vulnerabilities (more than 99% of our vulnerabilities were informational). Our large scan was boiled down to about 20k vulnerabilities or 15MB.

aenima-x commented 3 years ago

We never had a client with 100K vulns in one scan. For example in our commercial version, the average corp license is for 10K vulns total.

I suggest you keep with that workaround.

aenima-x commented 3 years ago

And if its posible (changing sensitive data) could you share with us the report?

gister9000 commented 3 years ago

This scenario is very rare indeed, however it happens that a client wants to scan up to 10 /24 subnets. We will keep the workaround then - nessus info stuff is mainly to aid the attacking team, not so useful in the report.

Unfortunately the report is very sensitive - we won't risk leaving a single detail about our client which might be a disaster. I'll do my best to send you some smaller unsensitive nessus scans (but still large enough to take some time).

In the end, I believe this issue is not significant enough to deter us away from faraday, but it is a large minus. Maybe multi-threading the import process would make this 1000 times faster? Why don't you check what the bottleneck is at the very least?

gister9000 commented 3 years ago

I guess nothing will be done here. Workaround is fine and problem is extremely rare - closed

gister9000 commented 3 years ago

Fact: any alternative solution is about 200 times faster when importing files. I won't name any other product since I don't want to market anyone, but the same file that took 2 days for faraday to import only took 3 minutes on other solutions. Tested with two different big files on three computers and two different operating systems.

Maybe you want to revisit this issue?

aenima-x commented 3 years ago

@gister9000 I will try to generate a file similar to yours to see where bottle neck is. If you can send us the report you mention in october ( I'll do my best to send you some smaller unsensitive nessus scans (but still large enough to take some time)) wolud be great

gister9000 commented 3 years ago

I am scanning some bug bounty targets and then I'll send you the report. It will take time because I've throttled scan speed in order to not cause problems on their side.

gister9000 commented 3 years ago

Here's a 37mb nessus file which imports for a while. I see that you did a test as well with 120mb nessus file but low number of vulns which took 10 seconds to import. We can conclude that number of vulns creates a problem rather than file size. I suspect that you calculate various statistics after each 10 vulns imported (and pie charts are generated etc) - maybe you should do this after importing 10% of the vulns or even after everything is imported.

Good luck and please inform me if you make an improvement :)

Here's a scan with thousands of vulns nessus_example.nessus.tar.gz

aenima-x commented 3 years ago

@gister9000 thanks

gister9000 commented 3 years ago

@aenima-x When you restart faraday service, upload is much faster.

I had faraday service running for 5 days and tried to upload 10k vulns nessus report. It took 20hours. Then, I restarted faraday service with systemctl and repeated the process - it lasted less than 1 hour!

aenima-x commented 3 years ago

@gister9000 That's very strange, you imported the same file to a different workspace? Could you send me your log file?

gister9000 commented 3 years ago

@aenima-x Yes, same file to a different workspace (both were empty initially). same file failed to import into one workspace few times on my colleague laptop (gateway timeout after 10+hours). Then I created a new workspace and it succeded in about 3 hours (18533 vulns total, 851 hosts).

faraday-server.log file contains workspace names which reveal client names. Cant share

I went through it and all I saw is "EOFError: Ran out of input" in finalize_request function which happens so often I believe it's not a bug at all. Did you want a different log file?

aenima-x commented 3 years ago

@gister9000 That "EOFError: Ran out of input" is a known issue of a library we use. Its not related to this.

Yes a was talking about the faraday-server.log. Can you run a sed to eliminate the clients names or something like that?

aenima-x commented 3 years ago

@gister9000 are you using the web ui to upload it? because it cant generate a timeout becase that api only uploads the file and the processing is done in background. It returns a 200 after the upload

gister9000 commented 3 years ago

Failed attempts were done via cli. Successful attempt was done via GUI.

There are no other errors besides eof.

On Tue, Mar 30, 2021, 4:37 PM Nicolas Rebagliati @.***> wrote:

@gister9000 https://github.com/gister9000 are you using the web ui to upload it? because it cant generate a timeout becase that api only uploads the file and the processing is done in background. It returns a 200 after the upload

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/infobyte/faraday/issues/418#issuecomment-810314992, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANITQTOEN2KQNOL2G76I2T3TGHOZHANCNFSM4SYJHRNA .

gister9000 commented 3 years ago

My colleague had the errors on his mac. Let us reproduce it and hopefully deliver logs and commands we used

On Tue, Mar 30, 2021, 4:39 PM Filip J @.***> wrote:

Failed attempts were done via cli. Successful attempt was done via GUI.

There are no other errors besides eof.

On Tue, Mar 30, 2021, 4:37 PM Nicolas Rebagliati @.***> wrote:

@gister9000 https://github.com/gister9000 are you using the web ui to upload it? because it cant generate a timeout becase that api only uploads the file and the processing is done in background. It returns a 200 after the upload

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/infobyte/faraday/issues/418#issuecomment-810314992, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANITQTOEN2KQNOL2G76I2T3TGHOZHANCNFSM4SYJHRNA .

aenima-x commented 3 years ago

Failed attempts were done via cli. Successful attempt was done via GUI. There are no other errors besides eof. On Tue, Mar 30, 2021, 4:37 PM Nicolas Rebagliati @.***> wrote: @gister9000 https://github.com/gister9000 are you using the web ui to upload it? because it cant generate a timeout becase that api only uploads the file and the processing is done in background. It returns a 200 after the upload — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#418 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANITQTOEN2KQNOL2G76I2T3TGHOZHANCNFSM4SYJHRNA .

Ok that makes sense. The cli process the file and send the results to the faraday api, that can generate a timeout. Upload to the web UI wont.

We will release a new version of cli with a feature that maybe resolve this as a workaorund but its not a final solution. With this feature you can disable the vulns with info severity, nessus generate lots and lots of info vulns that are not very useful. Without them the payloads are a lot smaller.

aenima-x commented 3 years ago

@gister9000 forget the log, if it was generated with the cli we wont see anything in them

gister9000 commented 3 years ago

@aenima-x We removed informational vulns from the scan before importing it.

I find it weird that no computer resource is being used 100% during the import process. Maybe turning off live result rendering would help since you are using flask and template engines are anything but fast - if you are rendering a template after each vuln imported that seems to be the issue.

gister9000 commented 3 years ago

@aenima-x What I said about no resource being used is wrong - cpu is the bottleneck during the import process which goes along with my theory that you need to stop live rendering when importing files (or render less often). postgres_cpu

aenima-x commented 3 years ago

The live rendering thing I think well not be on the new frontend. But that resource usage is the insertion in the database not the live rendering