jeremylong / Open-Vulnerability-Project

Java libraries for working with available vulnerability data sources (GitHub Security Advisories, NVD, EPSS, CISA Known Exploited Vulnerabilities, etc.)
Apache License 2.0
107 stars 30 forks source link

Debug output / progress #139

Closed EugenMayer closed 4 months ago

EugenMayer commented 4 months ago

I'am facing issues with re-creating the index and the level of information by vulnz is fairly limitted.

Are there any ways of showing progress or any debugging informations?

currently running java $JAVA_OPT -jar /usr/local/bin/vulnz cve $DELAY_ARG --cache --directory /usr/local/apache2/htdocs

EugenMayer commented 4 months ago

Found https://github.com/jeremylong/Open-Vulnerability-Project/blob/main/vulnz/src/main/java/io/github/jeremylong/vulnz/cli/commands/AbstractHelpfulCommand.java#L29

so for reference it is --debug

jeremylong commented 4 months ago

I had been looking into creating a more advanced output to display the progress: https://github.com/jeremylong/Open-Vulnerability-Project/blob/main/vulnz/src/main/java/io/github/jeremylong/vulnz/cli/ui/Screen.java. For now, I hope the --debug is sufficient. I suppose we could add some progress indication here: https://github.com/jeremylong/Open-Vulnerability-Project/blob/058429b9f5536f877a5eb81dfa8295a862bf0d54/vulnz/src/main/java/io/github/jeremylong/vulnz/cli/commands/CveCommand.java#L269

However, time has been an issue as I have a lot of non-OSS work going on right now.

EugenMayer commented 4 months ago

However, time has been an issue as I have a lot of non-OSS work going on right now.

This is the reason why i did no longer consider contributing my latest docker image adoptions. It does not make sense creating such a bottleneck IMHO.

The current state of the docker image is rahter meh, which i fixed including new flags like debug, maxRetry, records, fixed supervisor, fixed cronjob and so forth.

Back to topic, debug mode helped me to understand that this was on OOM issue.

I really do not understand why we are fetching all that data in keep it all the in the memory during the entire process (of about 1hour) - before we write it to the disk alltogether.

Do you really need to post-process all entries when all have been loaded or can we just write it in proper chunks? I did not look into the code yet, but maybe there is room to improvement here

EugenMayer commented 4 months ago

With #140 the debug option would now be available when running via the docker image

jeremylong commented 4 months ago

Thanks for the PR!

jeremylong commented 4 months ago

I really do not understand why we are fetching all that data in keep it all the in the memory during the entire process (of about 1hour) - before we write it to the disk altogether.

Honestly, because we are pulling data from an API that may not be sorted very well. We sort things into the yearly buckets when it is persisted to disk. Even if we are just pulling the last x days worth of updates - the updates could be across all files.

jeremylong commented 4 months ago

With the updated docker image (to be published soon) - can this issue be closed?

EugenMayer commented 4 months ago

I guess somewhat, since my issue was an OOM it can be clsosed. Thank you for updating the docker image!