Open johannesduesing opened 2 months ago
Today i ran the analysis on one of our servers (4 Cores, 30GB Heap Space). Unfortunately it crashed after ~5000 GAVs, i just restarted it with different configurations and hope to obtain some more results. Nevertheless, i did a preliminary evaluation on the results for those 5000 GAVs. Here's an overview:
Operation | AVG Time [ms] | MEDIAN Time [ms] | 75% Quantil [ms] |
---|---|---|---|
Project Classes Download & Init | 64 | 11 | 30 |
Library Classes Download & Init | 1685 | 594 | 2092 |
Project Instance Init | 446 | 37 | 145 |
- O1 | 131 | 11 | 41 |
- O2 | ~0 | 0 | 0 |
- O3 | 17 | 3 | 13 |
- O4 | 301 | 18 | 80 |
- O5 | 84 | 14 | 63 |
- O6 | 1 | 0 | 1 |
- O7 | 3 | 0 | 4 |
As you can see, the most relevant operations seem to be O4 (computing instance methods) and O5 (computing overriding methods).
Thank you for looking into this. I had a glance at the CSV, but didn't yet gain deeper insights. I think the steps that we expected to be the most expensive also ended up dominating the project creation time, with some differences between projects. Are the any insights you gained that would suggest a course of action besides a general "let's try not to compute everything all the time but just when needed"? Keeping in mind that that would probably increase latency because now some of the steps can just be started right away and done in parallel but if it is lazy, neither would be possible.
I do think it's rather tricky to optimize. While instance and overriding methods are the last thing to be performed before the project is created - and therefore could maybe be made lazy - that would impact project validation, which could only be performed in a reduced fashion, or not at all. Maybe we want to come back to the LazyProject
/ UnsafeProject
idea, with a separate class for use-cases where you e.g. only need the class hierarchy. Before we come to some final conclusions, i'd like to a) gather some more data and b) try the same experiments with your additions from #215 - just to see the performance impact.
Problem Statement
As discussed in our recent OPAL meeting, we want to understand what operations are performed (eagerly) when initializing a
Project
instance, and their respective impact on the overall performance. I had a first look and identified the following relevant operations:val
definition and happens onProject
instantiationlazy val
, so not really relevant in this context. However, it already features the following annotation:O1 runs concurrently to O2 & O3 and is waited for after O3 completes. O4 and O5 run concurrently while the main thread performs some array manipulations, both are waited for when the actual project instance is created - this is when O7 is triggered. O6 runs after the instantiation has completed, then the
Project
instance is returned.Empirical Evaluation
I implemented a small patch to OPAL that extracts the runtime of the operations mentioned above. Based on that i wrote an analysis that iterates Maven Central and does the following:
ClassFile
representationClassFile
representation (interfaces only)Project
instance based on those project- and library class filesGAV, #ProjectClasses, #Libraries, #LibraryClasses, StreamTime, LoadAndParseProjectCFsTime, LoadAndParseLibraryCFsTime, TotalProjectInitTime, O4Time, O1Time, O5Time, O7Time, O2Time, O3Time, O6Time
A first very basic run on ~1000 GAVs produced the following results: stats.csv. Note that all times are in milliseconds and the
LoadAndParse[Project|Library]CFsTime
depends on my local internet connection at home.Let me know if you have any ideas or additional input for me, then i'll run the analysis on our servers and post evaluation results under this issue.