Open jwmatthews opened 5 months ago
This issue is currently awaiting triage.
If contributors determine this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members.
Another data point from a run on M1Max, seeing this script (https://github.com/konveyor-ecosystem/kai/blob/main/samples/fetch_analyze_apps.py) take ~2 hours 45 minutes to complete to scan ~20 runs of 10 apps with an initial branch and then the solved branch.
https://gist.github.com/jwmatthews/4027954eef0ca53a218a05a184f06773
For most of the runs the branch which has been migrated to quarkus takes a lot longer to scan.
@jwmatthews Let me know how things are improved for you when you get a chance. We made a handful performance fixes last few weeks, wondering whether things can be improved still.
Below is a sample of what I see when I compare differences of Java EE vs Quarkus on a single application analysis. This is using Kantra build as of March 19 2024. (MacOS M1Max arm64)
A difference of 22 seconds vs 455 seconds, so a ~20x difference in the sample application below.
real 0m22.107s
user 0m0.114s
sys 0m0.149s
./example_analyze_single_app_with_custom_rules_tasks_qute.sh 0.12s user 0.16s system 1% cpu 22.152 total
real 7m35.338s
user 0m0.129s
sys 0m0.151s
./example_analyze_single_app_with_custom_rules_tasks_qute.sh 0.14s user 0.17s system 0% cpu 7:35.36 total
mkdir ./sample_apps
cd ./sample_apps
git clone https://github.com/konveyor-ecosystem/tasks-qute.git
cd ..
$ cat example_analyze_single_app_with_custom_rules_tasks_qute.sh
#!/usr/bin/env bash
SOURCE_DIR="tasks-qute"
SOURCE_ONLY=""
# If you want to run with source only uncomment below
# SOURCE_ONLY="-m source-only"
# Choose to either analyze the initial or solved branch
# Then comment out/in the appropriate below
#BRANCH="main"
#OUTDIR=${PWD}/tmp/${SOURCE_DIR}/initial
BRANCH="quarkus"
OUTDIR=${PWD}/tmp/${SOURCE_DIR}/solved
# ####
# Ensure we are on the expected branch before analysis.
# We are typically working with 2 branches an initial/solved
# It's been a common problem to forget which and create invalid analysis runs
# ####
pushd .
cd "${PWD}"/sample_repos/"${SOURCE_DIR}" || exit
git checkout "${BRANCH}"
popd || exit
mkdir -p "${OUTDIR}"
time ./bin/kantra analyze -i "${PWD}"/sample_repos/"${SOURCE_DIR}" "${SOURCE_ONLY}" -t "quarkus" -t "jakarta-ee" -t "jakarta-ee8+" -t "jakarta-ee9+" -t "cloud-readiness" --rules ./custom_rules -o "${OUTDIR}" --overwrite
@jwmatthews I think I can say with confidence that the 7m runtime you see will not grow linearly with size of the project. I think for a quarkus project, there's about 3m constant time to fetch dependencies that is not going away. Since you are running in full mode, it will always be there. I would be more interested in looking at windup's runtime on the same app. Note that even windup doesn't pull deps itself, so we cannot really compare 1:1.
Thanks @pranavgaikwad, here is another data point that agrees with what you shared, i.e. ~7m is likely for the fetch of quarkus dependencies.
Ran on a larger application
Sample App: https://github.com/jmle/monolith.git
Is there an existing issue for this?
Konveyor version
:latest as of 2/21/2024
Priority
Undefined (Default)
Current Behavior
We are observing performance differences we do not understand as we build up a library of sample apps for Java EE to Quarkus migrations. All testing/comments in this issue relate to what we observe on various Mac OS apple silicon laptops at least M1 Max and M2.
We are seeing that analyzing a migrated version of the sample app from Java EE to Quarkus will take anywhere from 3x to 10x longer than the original Java EE version.
This work is ultimately being done to populate a library of sample apps and analysis reports for work under Kai at https://github.com/konveyor-ecosystem/kai/blob/main/samples/fetch_analyze_apps.py
As we build up this library of information we are focused on rerunning and storing analysis reports from an application at it's initial state (Java EE) and then also re-running after the application has been migrated to Quarkus. The discrepancy is that we see a huge performance difference and are unsure why the change is so dramatic.
Trying to understand why analysis on the Quarkus branch is sometimes 10X more expensive in the gist: https://gist.github.com/savitharaghunathan/533fa71fb7dba3e76609f98647cd53bc
(Another data point from John running on M1 Max same script: https://gist.github.com/jwmatthews/c0b8d317b4a361c5e874dd24901d5ca7)
See this comment for a more extreme observation where we saw a 10x difference.
For example using this CMT application:
'main' branch
'quarkus' branch
Expected Behavior
We expected to see performance numbers scale in proportion based on number of files/size of source content, so for CMT we observe:
so we would assume the performance would be at a scale of maybe
How Reproducible
Always (Default)
Steps To Reproduce
We can break down the behavior to a single application such as CMT to aid as a reproducer. https://github.com/konveyor-ecosystem/cmt.git
On a M1 Max we observed
Below is how we are running this
Run analyze_cmt.sh
Use the below analyze_cmt.sh to run the kantra analysis
$ cat analyze_cmt.sh SOURCE_DIR=cmt OUTDIR=$PWD/analysis_reports/${SOURCE_DIR} mkdir -p $OUTDIR time ./bin/kantra analyze -i $PWD/sample_repos/$SOURCE_DIR -t "quarkus" -t "jakarta-ee" -t "jakarta-ee8+" -t "jakarta-ee9+" -t "cloud-readiness" -o $OUTDIR --overwrite
Anything else?
No response