Closed GoogleCodeExporter closed 8 years ago
A java program cannot crash a properly working JVM so the bug must be in the
JVM; as I
see it. A possible other location is one of the shared libraries Eclipse uses
that
contain native code. I'm pointing fingers at the JVM (or Eclpse).
If it's a JVM bug you should be able to reproduce without Eclipse.
You could also try to reinstall your JDK / Eclipse to see if it somehow broken.
JVM bugs are reported to Sun. You could try another JVM also.
Original comment by robin.ro...@gmail.com
on 11 Jun 2009 at 4:07
I've checked the system logs and found no hardware fault
records. The core dump occurs with JDK 1.6.0_12-b04 & 1.6.0_13-b03
and Eclipse 3.4.2 (M20090211-1700) & 3.5 RC3 (I20090528-2000) & 3.5 RC4
(I20090605-1444). I submitted the JVM dump log on java.sun.com. I submitted a
defect on Eclipse RC3; they resolved it saying 'not eclipse' and recommended
sending
the problem to JGit, since the frame at the fault was in
org.spearce.jgit.lib.OffsetCache.getOrLoad
Running the same update of EGit on WinVista with Eclipse 3.4.2 and JDK
1.6.0_12-b04, I see two native method exceptions in the log. On
WinVista, Eclipse runs without any message in the UI, though.
I've uninstalled Egit and can now the same workspace opens and I can do work on
Solaris. I'll happily try to find the cause of this problem. So far the
symptom is
triggered only by Egit. At the moment, I can't think of any further useful
areas for
investigation, so please let me know if there's something you'd like to see.
Are the native mode faults I see logged on Windows (shown in the attached log)
of no
concern?
Original comment by jwbito
on 11 Jun 2009 at 2:39
org.spearce.jgit_0.4.0.200903200852 is a old version. If it's broken it won't be
fixed. You have to try a new version. The latest integration build is from
20090514.
Nevertheless any brokeness should not result in a JVM crash unless the JVM
itself is
broken, either by native code in Eclipse (unlikely but possible), or something
wrong
in the JVM (or faulty). JGit plays with the garbage collector here so it may
well fin
a new bug in the JVM,
Try to build jgit manually using make_jgit.sh and then do a ./jgit clone
your_url
destdir and see it that breaks too.
Original comment by robin.ro...@gmail.com
on 11 Jun 2009 at 9:14
Thanks for pointing out the discrepancy with the jar file versions. The
install on
WinVista (where EGit is working for me) had a couple of different versions of
EGit
jars. Now that I've removed the old jars, there's no native method exception on
WinVista.
Now I have the same jars on both Solaris & WinVista:
org.spearce.egit.core_0.4.0.200906011726.jar
org.spearce.egit.ui_0.4.0.200906011726.jar
org.spearce.egit_0.4.0.200906011726.jar
org.spearce.jgit_0.4.0.200906011726.jar
The jgit CLI is able to clone the repository (please see below for a patch to
make_jgit.sh). When I use Egit's 'import repository' on Solaris, it appears to
clone
the git repository and core dumps before populating the working directory. (The
progress bar in Eclipse says 'checking out files'.) I also get a core dump if
I try
to open the workspace containing a project that was cloned by egit before this
problem cropped up (when I updated on May 14). I suppose I could got back to a
version before that date to confirm that the problem doesn't occur. Do you
think
that would be useful?
=== Suggested patch to start CLASSPATH with '.'===
diff --git a/make_jgit.sh b/make_jgit.sh
index 2969e6e..ba5f6c7 100755
--- a/make_jgit.sh
+++ b/make_jgit.sh
@@ -58,15 +58,10 @@ then
fi
VN=`echo "$VN" | sed -e s/-/./g`
-CLASSPATH=
+CLASSPATH=.
for j in $JARS
do
- if [ -z "$CLASSPATH" ]
- then
- CLASSPATH="$R/$j"
- else
- CLASSPATH="${CLASSPATH}${PSEP}$R/$j"
- fi
+ CLASSPATH="${CLASSPATH}${PSEP}$R/$j"
done
export CLASSPATH
Original comment by jwbito
on 12 Jun 2009 at 8:25
A clarification: the Eclipse Git Plugin 0.4.0 (published on the updated site as
Release Build) does not cause this problem.
I'd like to help resolve it, but I could really use some advice as to practical
next
steps. Would it make sense to try and identify the change that causes the
problem to
occur? So far, I've been able to confirm that 0.4.0.200904240032 and 0.4.0 work
without a problem.
Original comment by jwbito
on 15 Jun 2009 at 8:14
git bisect is a very good tool to search for problems.
See man git-bisect for details.
git checkout a bad version
git bisect start
open eclipse with this egit checkout
select the ui plugin and then select Run As Eclipse application
test it
if it fails => git bisect bad
if it works => git bisect good
exit the test eclipse (the one you launched
git checkout a known good version
test it the same way and mark as good/bad.
The second time and onward git will automatically check out a new version for
you to
test.
There is a chance that this bug will not show up when run this way, but you can
always
hope for it.
Normally Eclipse will pick up the changes in the workspace, but to make sure
please
perform a refresh before re-launching a new version of Eclipse to test.
You should have to test less than a dozen versions before the version that
triggers
the Eclipse/JVM bug pops up.
Original comment by robin.ro...@gmail.com
on 15 Jun 2009 at 9:15
When I import the repository in the eclipse that's launched from Run As, it
gets a
variety of NullPointerExceptions. There was a message complaining that the egit
projects depend on JRE J2SE-1.5, but that isn't available.
A variety of NullPointerExceptions occur whether the egit plugins are built at
the
head or at v0.4.0 (which is working OK in the eclipse install) - there is no
core dump.
Would it make sense to see how it behaves if I export the plugins to another
eclipse
install?
Thanks for your suggestions!
Original comment by jwbito
on 16 Jun 2009 at 2:26
I was able to test by build the egit feature in Eclipse 3.5RC4, exporting it and
installing the feature. The git bisect process yields:
bash-3.00$ git bisect good
2d77d30b5f5eca2b3087f1bab47fa9df2e64cd71 is first bad commit
commit 2d77d30b5f5eca2b3087f1bab47fa9df2e64cd71
Author: Shawn O. Pearce <spearce@spearce.org>
Date: Wed Apr 29 11:54:46 2009 -0700
Rewrite WindowCache to be easier to follow and maintain
The integration of WindowCache, ByteWindow, PackFile and WindowCursor
was a spaghetti of code that was impossible for even the original
author (me) to follow. Due to the way the responsibility for the
PackFile's open RandomAccessFile "fd" was distributed between these
four classes I could no longer prove to myself that the fd wouldn't
be closed while it was being accessed by another thread.
This rewrite generalizes most of the cache logic into a new class,
OffsetCache. The hope is that we can later reuse this code to make
a rewrite of UnpackedObjectCache, which uses similiar caching rules
as WindowCache, but applies a different hash function. That rewrite
is deferred to another change, but is anticipated by this one.
The new OffsetCache class uses the Java 5 atomic APIs to create a
much more concurrent hash table than we had before. We can now
perform no-miss reads without taking any locks. Reads that do
miss acquire a lock in order to prevent concurrent threads from
performing duplicate work loading the same window from disk,
however concurrent reads of different windows is still permitted.
Due to the more concurrent nature of the OffsetCache, it is now
possible for the cache to temporarily overshoot its resource limits.
This is a small temporary overshoot that is roughly bounded by the
number of concurrent threads operating against the same cache.
The API of the ByteWindow subclasses is now simplified by removing
the base class of SoftReference. It was a horrible idea to pass
the byte[] or MappedByteBuffer down through the call stack when the
implementation knew what type it should be operating on. We now
instead use a more traditional OO pattern of allowing the subclass
to directly specify its referent.
Responsibility for the RandomAccessFile "fd" within PackFile is now
strictly within PackFile. Two open reference counts track how the
callers are using the fd, ensuring that the fd remains open, so long
as the caller has made the appropriate begin*() invocation prior
to data access. One counter, beginWindowCache() is exclusively
for the ByteWindows created by WindowCache. Another counter,
beginCopyRawData(), is exclusively for PackWriter's need to lock
the PackFile open while it performs object reuse.
To keep the code simple a WindowCache.reconfigure() now discards the
entire current cache, and creates a new one. That invalidates every
open file, and every open ByteWindow, and forces them to load again.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
:040000 040000 9a7427cc5c0d42573f23c660ca08240d5b5a0e71
bfee62c6cd4c9f9b6bb359c7799540f039478732 M org.spearce.jgit.test
:040000 040000 1e23a88b41ce340519ffcc88f319115e592dca0b
388a17d9d4cc18323c8d6c934ab4c04cd5cf8040 M org.spearce.jgit
Original comment by jwbito
on 20 Jun 2009 at 5:53
Cut things down to the minimum nessary. Ok, I'm not surprised, but regardless
of what
we do here, this changes nothing regarding where the bug is. Assuming there is
a bug
of some sort in JGit here, you must still have a bug in the JVM (or Eclipse)
for the
JVM to crash here.
You can try to submit this to Sun as you claim it is repeatable. You need to
package
it neatly into something they can just grab and run and see for themselves
within a
minute or two, but it should really try to eliminate Eclipse from the equation
to
avoid the blame game.
Cutting this down to an effective bug report will be an interesting exercise.
Original comment by robin.ro...@gmail.com
on 22 Jun 2009 at 10:18
The response by Robin is a bit off-putting. If the crash weren't repeatable, I
wouldn't have tracked down the commit using git bisect.
It's not my code that's crashing Eclipse, so I would sincerely hope that the
problem
is one that 'we' would be working to isolate together. I've certainly
appreciated
the guidance thus far and have done my best to follow it. I'm definitely
interested
in contributing to the improvement of EGit. Today, Egit doesn't work on
Solaris 10
with any of 3 different Java 6 runtimes and Eclipse 3.4.2 or 3.5 (RC3 & RC4).
You mention JGit. Would it be useful (and feasible) to run a JGit test inside
Eclipse and outside Eclipse to see if that helps to locate the problem? The
jgit
clone test you suggested before did not cause the problem, although cloning
with egit
crashes before it populates the working directory. (I don't know for sure, but
it
appears to have fully populated the local (.git) repo.)
Right now, I'm trying to use the latest integration build (200906160801) to
clone
http://repo.or.cz/r/egit.git and it seems to be hung on "Get pack-e45a866..idx:
8% (
42/514)". Please keep in mind that this is Solaris 10 on sparc. On WinVista, I
haven't seen a problem since you pointed out that there was an old egit jar in
my
plugins.
Thanks again for your guidance in trying to resolve this problem.
John
Original comment by jwbito
on 22 Jun 2009 at 11:30
[deleted comment]
Note that I cannot fix bugs in th Sparc JVM. Sun can and there is a link in the
hs_err
file to where you submit bug reports directly to Sun. Include all information
about
versions and URL's for necessary downloads and see what response you get,
Running JGit from inside eclipse / outside probably won't affect the outcome,
but then
I have no idea what triggers the bug in the JVM.
Original comment by robin.ro...@gmail.com
on 27 Jun 2009 at 9:54
Thank you for responding, Robin. As I noted in my message to the group
<http://thread.gmane.org/gmane.comp.version-control.git/122265>, I have
reproduced
the crash on all modern Sun JVMs and submitted the crash log to sun (two on the
site
referenced in the hs_err file and two on bugs.sun.com).
I tried producing the crash with jgit from the command line and that was able to
complete. I'll try it with the JVM versions that I got since I tested it last.
The
crash also occurs using Egit import on git://repo.or.cz/egit.git.
If you wish to leave this as a known issue with Egit 0.4.9, I cannot argue with
you.
I was hoping that you'd be willing to work with me to find a workaround and/or a
specific test case that would motivate the Java team to accept it as a bug.
I would speculate that the problem is related to one of the bugs in nio that
cause
errors when the code tries to access data on non-aligned boundaries
<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=2172587>. These can't
happen on
x86 hardware, since it (generally) allows non-aligned access.
Original comment by jwbito
on 27 Jun 2009 at 5:47
In general you don't (read: should not) work around bugs. Bugs should be fixed.
Workarounds are a last desperate resort.
The sorry state of much of the software around us comes from working around bugs
instead of fixing them. After a while you get of mess of workarounds for
workarounds.
It gets more complicated when the bug is in software we cannot (?, i.e.g
openjdk) fix,
but the principle should be the same.
If you want to work around it as an intermediate measure you can try reverting
the
commit that you identified. That may be a non-trivial, but probably not very
hard.
That revert will not be part of JGit however.
We can keep this open until we can attach a reference to a SUN bug report.
Original comment by robin.ro...@gmail.com
on 2 Jul 2009 at 3:22
I have verified that the problem is also caused by the org.eclipse
(0.5.0.200908141101) version of the plugin with JDK 1.6.0_16 and also JDK 1.7
milestone 4 running in Eclipse 3.5.
Original comment by jwbito
on 21 Aug 2009 at 6:06
Any version after the commit identified will probably trigger the JVM bug. Did
you
send a bug report to Sun?
Original comment by robin.ro...@gmail.com
on 21 Aug 2009 at 9:01
As I mentioned earlier, Ive submitted a number of reports, including for
1.6.0_10,
1.6.0_16 and a couple of versions of 1.7 including Milestone 4 (which is what
was
available this morning).
It's not surprising that Sun hasn't looked at it; I doubt they consider Eclipse
to be
a small test case.
As I've said before, I'd like to help get to the specific code that has the
problem,
but I'd need some clear tasks.
Original comment by jwbito
on 21 Aug 2009 at 9:10
Hey - I'm able to use plugin version 0.5.0.200908282229 on SPARC (Solaris 10)
with
the EA release of 1.6.0u18!
Original comment by jwbito
on 31 Aug 2009 at 5:18
Good. Please close.
Original comment by robin.ro...@gmail.com
on 31 Aug 2009 at 6:58
I think closing the issue requires a privilege I don't have. I added a comment
in
the Egit wiki to let folks know they may encounter a problem on sparc.
Original comment by jwbito
on 31 Aug 2009 at 9:25
not our bug.
Original comment by robin.ro...@gmail.com
on 2 Sep 2009 at 7:47
Original issue reported on code.google.com by
jwbito
on 8 Jun 2009 at 6:03Attachments: