Closed GoogleCodeExporter closed 9 years ago
i made a build of strongtalk using visual C++ 6. ( see build details below )
loading the image takes around 100 msec with the original binary ("Strongtalk-2.0")
running the mandelbrot test with optimized (unboxed) floats takes:
10 sec without recompilation
1 sec with recompilation
loading the image takes around 35 msec with my binary
running the mandelbrot test with optimized (unboxed) floats takes:
10 sec without recompilation
1 sec with recompilation
so i couldn't reproduce the difference you measured with the Mandelbrot, and my
build is actually faster to load the image. however using vs2005, i could reproduce
the slow image loading: it was taking around 10x more time.
so i suppose the issue is clearly related (and probably specific to) vs2005
loading the image takes around
Build details for visual C++ 6
starting with a fresh source:
made the .dsp project for all files using a script
compiled compiled makeDeps
used makeDeps like this:
copy includeDB + includeDB2 includeDB.all
makeDeps.exe makedeps-platform.txt includeDB.all
(output of makeDeps in /deps/incls)
in the .dsp project:
added all source directories as include directories
added PRODUCT,DELTA_COMPILER,MICROSOFT in defines
visual C++ 6 will complain about new operators for two classes:
ResourceObj (allocation.hpp)
PIC (compiledPIC.hpp)
to avoid warnings, i added dummy operators to make the compiler happy:
void operator delete(void* p, int) {}
to the two classes.
exluded process_asm.cpp from the build
these changes were enough to compile everything
for linking, all i had to add is the st_asm.lib files
and it works.
strongtalk.exe 948k
i'm impressed by how easy to was to build on visual C++ 6 ... i usually encouter
more resistance. thanks ^^
Original comment by prunedt...@gmail.com
on 24 Mar 2007 at 9:30
Thanks prunedtree, that helps isolate the problem. It is a good thing that it
appears to be due to VC++ rather than some unknown source difference between the
original binary from Sun and the open source.
- It would be very helpful if you could also report your machine CPU specs and
also
numbers from a run of the original binary from Sun, which is what I was
comparing
with. The original is *not* 2.0, it is the binary from any version pre-2.0.
- As for the apparent unboxed floating point regression, I took another look at
that.
It may be that the issue is that C++ floating point got faster, rather than
Strongtalk got slower. At the moment, I am getting ~726ms on both Strongtalk
versions for unboxed floats, whereas "Call C" takes 801ms under VS 6, but only
303ms
under VS 2005. So originally Strongtalk was actually a bit faster at (this)
floating
point than C++, which we figured was due to improperly aligned doubles in C++;
it
looks like they have fixed this. (One other possibility was that they expected
the
stack to be double-word aligned on entry to the C++ routine, I am not sure
whether
Strongtalk ensures double word alignment for callouts; in which case it might
appear
intermittent based on the current stack alignment at the time of the callout).
- So, to summarize, at the moment it appears that there is a real regression in
image
loading caused by some VC++ difference, but at the moment it appears that the
floating point regression may not be real.
Original comment by David.Gr...@gmail.com
on 24 Mar 2007 at 10:08
My CPU is an AMD athlon XP 2800+ (2079 mhz)
- With r36 compiled with vc6
image loading : 36 ms
mandelbrot:
unboxed floats, C : 453 ms
unboxed floats, interpreted: 10966 ms
unboxed floats, compiler on: 1109 ms
boxed floats, interpreted: 19694 ms
boxed floats, compiler on: 6071 ms
- With binary from Strongtalk-1.1.2
image loading: 36 ms
mandelbrot:
unboxed floats, C : 752 ms
unboxed floats, interpreted: 10754 ms
unboxed floats, compiler on: 1096 ms
boxed floats, interpreted: 18628 ms
boxed floats, compiler on: 5753 ms
Mainly a difference in the C performance, nothing surprising regarding the
widely
different performance of x87 code depending on compilers and compiler settings.
Regarding alignement, IIRC the x86 C ABI enforces (32 bit) word alignement only.
Original comment by prunedt...@gmail.com
on 31 Mar 2007 at 1:15
Btw, the differences in bootstraping speed are closely linked to the LIBC in
use.
Multithreaded CLIB is roughly 4x slower for instance, from 35 ms to 150 ms...
I think this issue can be considered solved.
Original comment by prunedt...@gmail.com
on 30 Apr 2007 at 7:31
prunedtree:
Why do you think this is considered solved? I had thought about this too when I
noticed this bug, and tried all the other library options. Although there were
speed
differences, I didn't find any that got anywhere close to the speed of the
original
executable, and I don't see any options in VC++ 2005 for non-multithreaded
libraries.
How do you get it to use non-multithreaded libraries?
Original comment by David.Gr...@gmail.com
on 1 May 2007 at 12:31
The single-threaded versions of the libraries have been discontinued, because
the
performance of the multi-threaded versions is "close" to that of the
single-threaded
versions. One thing that helps speed up the bootstrap is to use the non-locking
functions to read the stream.
When I replaced references to getc with _getc_nolock, this improved bootstrap
from
~200ms to 75ms on my machine (Intel Quad Core Q6700 @ 2.66GHz). For reference,
the
1.0 release bootstraps in 39ms.
Using the non-debug versions of the CRT reduced bootstrap further to around
65ms. I
think that is probably as close as we are going to get right now.
Unfortunately, _getc_nolock is a windows specific function, so we will either
have to
put it in os:: or put in conditional compilation in bootstrap.cpp.
Original comment by StephenL...@gmail.com
on 12 Aug 2008 at 9:41
Well, it seems pretty obvious that I/O is the bottleneck, so I think simple
buffering
(using fread) will solve the issue, and it's more portable
Original comment by prunedt...@gmail.com
on 13 Aug 2008 at 5:54
It's not so much the IO itself, but the locking in the multi-threaded
libraries. On
further reading another, better alternative would be to compile with the locks
turned
off by defining _CRT_DISABLE_PERFCRIT_LOCKS. With this defined, and having
reverted
_getc_nolock to the portable getc the system bootstraps in 23ms which is
actually
faster on my system than the original 1.0 release. Clearly there were other
locks in
the CRT that were inhibiting bootstrap performance.
Subject to no other problems emerging from the lack of the use of locks in the
CRT,
which should be minimal, since the system is effectively pretty much single
threaded
at the moment (with a few notable exceptions), I think this pretty much
resolves this
issue.
As far as Mandelbrot goes, my 10-run average figures are as follows. All are
with the
compiler turned on and for 500 iterations.
For the 1.0 release
Optimised floats 220.6ms
Boxed floats 898.7ms
C 328.9ms
For the current release with the above locking fix
Optimised floats 231.0ms
Boxed floats 1093.9ms
C 95.9ms
This shows an 18% degradation in the boxed float performance, a 5% degradation
in
optimised float performance and a massive 243% improvement in C performance.
I also performed the same test with the lock fix from above turned off. This
didn't
make much difference. For reference the figures are
Optimised floats 226.6ms
Boxed floats 1088.9ms
C 93.0ms
On this evidence the C performance has improved substantially, while Strongtalk
has
been pretty much static, or slightly regressed.
Original comment by StephenL...@gmail.com
on 13 Aug 2008 at 10:45
for mandelbrot, it's related to how much your compiler optimizes x87 code,
that's all. For slower boxed floats I'd
bet that it's the way the GC is compiled that creates the difference. (indeed,
the strongtalk compiler doesn't care
about your C++ compiler at all. well, as long as it's bugfree that is...)
Original comment by prunedt...@gmail.com
on 27 Aug 2008 at 8:53
Marking this as fixed. There is no degradation in the Mandelbrot performance -
just
improvement in the performance of the equivalent C code when compiled on a
modern
compiler.
The image bootstrap issue has been fixed by disabling locking in the C runtime
when
compiling the VM.
Original comment by StephenL...@gmail.com
on 19 Dec 2009 at 2:19
Original issue reported on code.google.com by
David.Gr...@gmail.com
on 25 Nov 2006 at 6:33