timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-16301] Fingerprint performance #2794

Closed timja closed 11 years ago

timja commented 11 years ago

A user working with files in the 100+ Mb range asked whether fingerprinting them was expensive. Probably it is not that expensive but it is likely the implementation could be optimized and at least measured.

As far as the I/O goes, FilePath.digest uses FileInputStream rather than NIO—hmm, maybe OK. It does not use buffering, which may or may not be an issue here. Then it calls Util.getDigestOf.

That uses the JRE’s MD5 implementation which AFAIK takes advantage of any cryptographic hardware acceleration when available. It does read into a 1024-byte buffer however which seems excessively small. (And uses a proxy stream rather than simply calling the update method, which is odd.) http://stackoverflow.com/questions/9321912/very-slow-when-generaing-md5-using-java-with-large-file suggests using a custom library though this may be overkill; others suggest DigestUtils from Commons Codec. A


Originally reported by jglick, imported from: Fingerprint performance
  • assignee: jglick
  • status: Resolved
  • priority: Minor
  • resolution: Fixed
  • resolved: 2013-05-28T20:28:56+00:00
  • imported: 2022/01/10
timja commented 11 years ago

jglick:

Fingerprint storage seems to spend most of its time in XStream overhead.

timja commented 11 years ago

jglick:

Baseline average time (in 1.517-SNAPSHOT) of Util.getDigestOf(new FileInputStream("…/jenkins-war-1.509.1.war")): 220msec.

Using a BufferedInputStream with default buffer size: 193msec.

Using direct buffer update rather than DigestInputStream saves nothing, nor does using MappedByteBuffer on FileChannel, nor does using DigestUtils, nor does using a larger temporary buffer.

Fast MD5 Implementation in Java takes 183msec, which is barely faster, but it requires JNI and is LGPL.

So I think I will switch to DigestUtils just to reduce custom code, and add buffering.

timja commented 11 years ago

jglick:

Since the main problem from reports I have seen is Fingerprint.save calling complex code in XStream, with other threads waiting for the monitor, I am looking into whether a hand-coded “fast path” can perform better.

timja commented 11 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
core/src/main/java/hudson/Util.java
http://jenkins-ci.org/commit/jenkins/51fbd2d8675fb3703ce38d935e661abf03e1b83b
Log:
JENKINS-16301 Replace impl of getDigestOf with standard (Commons Codec) DigestUtils.md5Hex, for simplicity.

timja commented 11 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
core/src/main/java/hudson/FilePath.java
http://jenkins-ci.org/commit/jenkins/9d72311ba0222b9d0489605b8ec2cec711690993
Log:
JENKINS-16301 Use a buffered input stream to get somewhat better performance.

timja commented 11 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
core/src/main/java/hudson/model/Fingerprint.java
core/src/main/java/hudson/util/PersistedList.java
core/src/test/java/hudson/model/FingerprintTest.java
core/src/test/resources/hudson/model/fingerprint.xml
http://jenkins-ci.org/commit/jenkins/8d2e4523368716c82dc7333b4357d21d0dc93526
Log:
JENKINS-16301 Verifying current serial format as a baseline.

timja commented 11 years ago

dogfood:

Integrated in jenkins_main_trunk #2533
JENKINS-16301 Replace impl of getDigestOf with standard (Commons Codec) DigestUtils.md5Hex, for simplicity. (Revision 51fbd2d8675fb3703ce38d935e661abf03e1b83b)
JENKINS-16301 Use a buffered input stream to get somewhat better performance. (Revision 9d72311ba0222b9d0489605b8ec2cec711690993)
JENKINS-16301 Verifying current serial format as a baseline. (Revision 8d2e4523368716c82dc7333b4357d21d0dc93526)

Result = SUCCESS
Jesse Glick : 51fbd2d8675fb3703ce38d935e661abf03e1b83b
Files :

Jesse Glick : 9d72311ba0222b9d0489605b8ec2cec711690993
Files :

Jesse Glick : 8d2e4523368716c82dc7333b4357d21d0dc93526
Files :

timja commented 11 years ago

jglick:

As a baseline,

package hudson.model;
import hudson.Util;
import java.io.File;
public class Fingerprints {
    private static byte[] toByteArray(String md5sum) {
byte[] data = new byte[16];
for( int i=0; ibyte)Integer.parseInt(md5sum.substring(i,i+2),16);
return data;
    }
    private static final byte[] SOME_MD5 = toByteArray(Util.getDigestOf("whatever"));
    public static void main(String[] args) throws Exception {
Fingerprint f = new Fingerprint(new Fingerprint.BuildPtr("foo", 13), "stuff.jar", SOME_MD5);
f.addWithoutSaving("some", 1);
f.addWithoutSaving("some", 2);
f.addWithoutSaving("some", 3);
f.addWithoutSaving("some", 10);
f.addWithoutSaving("other", 6);
for (int i = 0; i < 100; i++) {
    for (int b = 0; b < 100; b += 2) {
f.addWithoutSaving("job" + i, b);
    }
}
File x = new File("/tmp/fp.xml");
int count = 10000;
long start = System.currentTimeMillis();
for (int i = 0; i < count; i++) {
    f.save(x);
}
long end = System.currentTimeMillis();
System.out.println((1f * (end - start) / count) + "msec to save");
start = System.currentTimeMillis();
for (int i = 0; i < count; i++) {
    Fingerprint.load(x);
}
end = System.currentTimeMillis();
System.out.println((1f * (end - start) / count) + "msec to load");
    }
}

reports:

1.8081msec to save
1.6956msec to load
timja commented 11 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
core/src/main/java/hudson/model/Fingerprint.java
core/src/test/java/hudson/model/FingerprintTest.java
http://jenkins-ci.org/commit/jenkins/41edc5867ba194fdb06b2e2ffa6b6addfaa99986
Log:
JENKINS-16301 Verifying current load/save implementation can round-trip fingerprints.

timja commented 11 years ago

dogfood:

Integrated in jenkins_main_trunk #2538
JENKINS-16301 Verifying current load/save implementation can round-trip fingerprints. (Revision 41edc5867ba194fdb06b2e2ffa6b6addfaa99986)

Result = SUCCESS
Jesse Glick : 41edc5867ba194fdb06b2e2ffa6b6addfaa99986
Files :

timja commented 11 years ago

jglick:

New baseline times are a bit shorter: 0.942msec to save. Using a hand-written serializer, I can get that down to 0.7601msec, a 19% savings. Not exactly dramatic but this is based on a single thread doing serial operations; the hand-written version does not contend locks.

So to check that, changed the test program to run under concurrency stress (each thread saving its own Fingerprint over and over), and also using a tmpfs so that actual I/O time can be ignored:

package hudson.model;
import hudson.Util;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;
public class Fingerprints {
    private static byte[] toByteArray(String md5sum) {
byte[] data = new byte[16];
for( int i=0; ibyte)Integer.parseInt(md5sum.substring(i,i+2),16);
return data;
    }
    private static final byte[] SOME_MD5 = toByteArray(Util.getDigestOf("whatever"));
    public static void main(String[] args) throws InterruptedException {
for (int concurrency = 1; concurrency <= 10; concurrency++) {
    List<Thread> threads = new ArrayList<Thread>();
    final AtomicLong totalTime = new AtomicLong();
    final AtomicInteger totalCount = new AtomicInteger();
    for (int i = 0; i < concurrency; i++) {
final File x = new File("/run/user/" + System.getenv("USER") + "/fp" + i + ".xml");
Thread t = new Thread("saver-" + i) {
    @Override public void run() {
Fingerprint f = new Fingerprint(new Fingerprint.BuildPtr("foo", 13), "stuff.jar", SOME_MD5);
f.addWithoutSaving("some", 1);
f.addWithoutSaving("some", 2);
f.addWithoutSaving("some", 3);
f.addWithoutSaving("some", 10);
f.addWithoutSaving("other", 6);
for (int i = 0; i < 100; i++) {
    for (int b = 0; b < 100; b += 2) {
f.addWithoutSaving("job" + i, b);
    }
}
int count = 10000;
long start = System.currentTimeMillis();
for (int i = 0; i < count; i++) {
    try {
f.save(x);
    } catch (IOException e) {
assert false : e;
    }
}
long end = System.currentTimeMillis();
totalTime.addAndGet(end - start);
totalCount.addAndGet(count);
    }
};
t.start();
threads.add(t);
    }
    for (Thread t : threads) {
t.join();
    }
    System.out.println(concurrency + "×: " + (totalTime.floatValue() / totalCount.floatValue()) + "msec to save");
}
    }
}

Baseline times (on a quad-core):

1×: 0.7479msec to save
2×: 0.6742msec to save
3×: 0.8395333msec to save
4×: 1.042125msec to save
5×: 1.2944msec to save
6×: 1.5228167msec to save
7×: 1.7666msec to save
8×: 2.0307374msec to save
9×: 2.297811msec to save
10×: 2.55688msec to save

With patch:

1×: 0.2807msec to save
2×: 0.234msec to save
3×: 0.26313335msec to save
4×: 0.329525msec to save
5×: 0.41194msec to save
6×: 0.49443334msec to save
7×: 0.56754285msec to save
8×: 0.650325msec to save
9×: 0.7313111msec to save
10×: 0.81052msec to save

This shows a 2–3× speedup in CPU time, with the biggest gains under concurrent load. I am going to stop there for now; not bothering to optimize load which is I think called rather less often.

timja commented 11 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
changelog.html
core/src/main/java/hudson/model/Fingerprint.java
http://jenkins-ci.org/commit/jenkins/fff90c100d6b030f86d92cc4f0c23af3c648c75e
Log:
[FIXED JENKINS-16301] Optimized Fingerprint.save.

timja commented 11 years ago

dogfood:

Integrated in jenkins_main_trunk #2555
[FIXED JENKINS-16301] Optimized Fingerprint.save. (Revision fff90c100d6b030f86d92cc4f0c23af3c648c75e)

Result = UNSTABLE
Jesse Glick : fff90c100d6b030f86d92cc4f0c23af3c648c75e
Files :

timja commented 11 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
core/src/main/java/hudson/model/Fingerprint.java
core/src/test/java/hudson/model/FingerprintTest.java
http://jenkins-ci.org/commit/jenkins/f1b513cf91663498aa1677580776a7c42c7322f2
Log:
JENKINS-16301 Need to create parent directory of fingerprint file too.

timja commented 11 years ago

dogfood:

Integrated in jenkins_main_trunk #2556
JENKINS-16301 Need to create parent directory of fingerprint file too. (Revision f1b513cf91663498aa1677580776a7c42c7322f2)

Result = SUCCESS
Jesse Glick : f1b513cf91663498aa1677580776a7c42c7322f2
Files :

timja commented 11 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
changelog.html
core/src/main/java/hudson/Util.java
test/src/test/java/hudson/maven/MavenBuildTest.java
http://jenkins-ci.org/commit/jenkins/e0a3a1ddf5fa08e80a94a565dd05c39f76de6e93
Log:
[FIXED JENKINS-18178] Reverting an inessential part of the fix of JENKINS-16301 since it broke some Maven builds.
Apparently the Maven 2 process factory fails to adequately insulate the Maven process from Jenkins library dependencies;
if you specify a build extension that can override dependencies used by Jenkins core classes!
A real fix would involve changing class loader delegation, since presumably similar bugs could still occur.

timja commented 11 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
core/src/main/java/hudson/FilePath.java
core/src/main/java/hudson/Util.java
maven-plugin/src/main/java/hudson/maven/ExecutedMojo.java
maven-plugin/src/main/java/hudson/maven/reporters/MavenArtifact.java
maven-plugin/src/main/java/hudson/maven/reporters/MavenArtifactArchiver.java
http://jenkins-ci.org/commit/jenkins/11953750ab422bcec28d12a436867d0888313aaa
Log:
[JENKINS-16301 Use a buffered input stream to get somewhat better performance also from Maven archiving.
Introducing Util.getDigestOf(File) as a convenience.

Compare: https://github.com/jenkinsci/jenkins/compare/d18a27e5bed4...11953750ab42

timja commented 11 years ago

dogfood:

Integrated in jenkins_main_trunk #2748
[FIXED JENKINS-18178] Reverting an inessential part of the fix of JENKINS-16301 since it broke some Maven builds. (Revision e0a3a1ddf5fa08e80a94a565dd05c39f76de6e93)
[JENKINS-16301 Use a buffered input stream to get somewhat better performance also from Maven archiving. (Revision 11953750ab422bcec28d12a436867d0888313aaa)

Result = SUCCESS
Jesse Glick : e0a3a1ddf5fa08e80a94a565dd05c39f76de6e93
Files :

Jesse Glick : 11953750ab422bcec28d12a436867d0888313aaa
Files :

timja commented 11 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/hudson/maven/ExecutedMojo.java
src/main/java/hudson/maven/reporters/MavenArtifact.java
src/main/java/hudson/maven/reporters/MavenArtifactArchiver.java
http://jenkins-ci.org/commit/maven-plugin/db2b69174c511a1cba440ce3683de4cd6676734b
Log:
[JENKINS-16301 Use a buffered input stream to get somewhat better performance also from Maven archiving.
Introducing Util.getDigestOf(File) as a convenience.
Originally-Committed-As: 11953750ab422bcec28d12a436867d0888313aaa

timja commented 11 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
test/src/test/java/hudson/maven/MavenBuildTest.java
http://jenkins-ci.org/commit/maven-plugin/1342c5f819d6672ce53238822a93dddc087f0bd8
Log:
Reverting an inessential part of the fix of JENKINS-16301 since it broke some Maven builds. Apparently the Maven 2 process factory fails to adequately insulate the Maven process from Jenkins library dependencies; if you specify a build extension that can override dependencies used by Jenkins core classes! A real fix would involve changing class loader delegation, since presumably similar bugs could still occur.

timja commented 2 years ago

[Originally related to: JENKINS-13154]

timja commented 2 years ago

[Originally related to: JENKINS-11814]

timja commented 2 years ago

[Originally depends on: JENKINS-18178]

timja commented 2 years ago

[Originally depends on: JENKINS-19515]

timja commented 2 years ago

[Originally depends on: JENKINS-18337]