trustification / trustify

Apache License 2.0
10 stars 19 forks source link

Improve memory consumption #654

Closed ctron closed 3 weeks ago

ctron commented 2 months ago

I noticed that, at least when importing, the server might consume huge amounts of memory (~3GiB). I know that some documents are just big. But we should keep an eye on this and try to improve the situation where possible.

JimFuller-RedHat commented 2 months ago

observation: avoiding entire documents being loaded into memory for processing, usually imply streaming approach during ingestion though I have no clue how one does this in rust with serde.

ctron commented 2 months ago

I fear a pattern we have for SBOMs is:

Holding on to results of every step, and only dropping all memory at the end.

jcrossley3 commented 2 months ago

observation: avoiding entire documents being loaded into memory for processing, usually imply streaming approach during ingestion though I have no clue how one does this in rust with serde.

In v1, we went to great pains to avoid loading an entire doc into memory, only to later realize that it was the only way we could accurately validate the document. In light of that experience, we avoided all that pain in v2, at the risk of memory exhaustion. We can probably strike a better balance, but we need to ensure that we can sufficiently validate a doc that doesn't fully reside in RAM. We should solve the validation problem first.

jcrossley3 commented 2 months ago

Holding on to results of every step, and only dropping all memory at the end.

Dropping the byte array after deserializing might be the lowest hanging fruit?

helio-frota commented 2 months ago

Dropping the byte array after deserializing might be the lowest hanging fruit?

array -> array ? yes Vec -> yes or I'm not understanding rust yet

bobmcwhirter commented 2 months ago

Yep. Let's add some scope or explicit drops.

JimFuller-RedHat commented 2 months ago

FWIW if validators also do not stream then that might be a problem addressed upstream (if we are validating with 3rd party parser).

jcrossley3 commented 2 months ago

Any thoughts on the best way to measure memory usage in rust programs? What's the simplest way I can determine what effect my changes make?

jcrossley3 commented 2 months ago

Holding on to results of every step, and only dropping all memory at the end.

I need help finding an example of this. When I look at the loaders, they each deserialize a byte buffer, e.g.

        let csaf: Csaf =
            info_span!("parse document").in_scope(|| serde_json::from_reader(document))?;

But because document is moved into from_reader, I believe it's dropped once the Csaf struct is returned, right? Where is an example of us holding onto the bytes after deserialization?

ctron commented 2 months ago

Comparing today with yesterday, I see a signification improvement of memory usage. Maybe removing the format detection did that.

ctron commented 1 month ago

Another component I identified as memory hog during the load testing was the graphql endpoint. That bumped the memory usage from about 2 GiB to 16 GiB, just by using that endpoint with a single user.

I created a tracking issue for that: https://github.com/trustification/trustify/issues/750

JimFuller-RedHat commented 3 weeks ago

@ctron @jcrossley3 can we declare victory on this ?

jcrossley3 commented 3 weeks ago

Strictly speaking, we probably have accomplished the goal as expressed in the title, so I'm ok closing this. We can create more specific issues as future perf challenges are identified.

ctron commented 3 weeks ago

Strictly speaking, we probably have accomplished the goal as expressed in the title, so I'm ok closing this. We can create more specific issues as future perf challenges are identified.

You're right!