Closed GoogleCodeExporter closed 9 years ago
Is this the issue Mark was having yesterday afternoon with multiple instances?
Do all but one instance terminate?
If so, this is a bug I introduced while checking consistency of feature
numbers, which I subsequently fixed.
Original comment by i...@cathilya.org
on 18 Mar 2011 at 4:22
No, this is not a multiple instance issue.
I created a tarball with all the files needed to reproduce this problem, and
put it in /iicbu/Labstuff/issue_24 (212Megs.. big!)
I caught this bug while fixing the wndchrm experiment script for John's
pairwise stuff. It has to do with the conversion of .sig files named using the
naked pre-1.30 convention to the current naming convention.
To reproduce:
1. Untar the tarball, and run the command in the unzipped directory.
wndchrm train -m -S500 -t4 `pwd` whatever.fit
This bug consistently occurs on lgchrm17, whether or not you use the new -O
switch, which skips the check to see if the sigs match. However on my Mac OS
10.6, the bug only occurred when using -O, and not when wndchrm was allowed to
reprocess the sigs.
2. On lgchrm17, the signatures will match just fine, and wndchrm will start
renaming the sig files accordingly. On my intel OS 10.6 laptop, the signatures
will not match, and wndchrm will start calculating new sigs (this in itself is
not unexpected, but it's still really a bug that should be investigated). Note
that there are a ton of files here and it takes quite a while for wndchrm to
perform the signature matching check (thus the -O switch)
3. Wndchrm will be spitting out lots of output about converting old signature
files, but then it'll just terminate, with the output being truncated. I
copy/pasted the last three lines of output below:
...
Old signature file
'/home/colettace/projects/delaney_segfault/coletta_test/001vs002_Cdx2-D_VS_Dlx3-
D/Cdx2+D/J0022G01-C02_3_0.sig' converted to
'/home/delaneyjd/ESC_Analysis/AutomatedAnalysis/ScottyRepeat-1/Cdx2Dox+/J0022G01
-C02-S500-t4_3_0.sig' with 1025 features.
Old signature file
'/home/colettace/projects/delaney_segfault/coletta_test/001vs002_Cdx2-D_VS_Dlx3-
D/Cdx2+D/J0022G01-C02_3_1.sig' converted to
'/home/delaneyjd/ESC_Analysis/AutomatedAnalysis/ScottyRepeat-1/Cdx2Dox+/J0022G01
-C02-S500-t4_3_1.sig' with 1025 features.
Old signature file
'/home/colettace/projects/delaney_segfault/coletta_test/001vs002_Cdx2-D_VS_Dlx3-
D/Cdx2+D/J0022G01-C02_3_2.sig' converted to
'/home/delaneyjd/ESC_Analysis/AutomatedAnalysis/ScottyRepeat-1/Cdx2Dox+/J0022G01
-C02-S500-t4_3_2.s
Note that the third line is missing the last few characters in the file name,
as well the "with 1025 features" part of the string. This happens consistently
at the same place each run through.
The .fit file associated with this run will be left there in the directory with
size 0. I ran it through the debugger, and gdb said that the process exited
normally. Weird! Indicates a memory overflow issue, no?
4. So, the workaround here is simply to hit up in the shell and re-run the same
command, wndchrm train -m -S500 -t4 `pwd` whatever.fit, and this time it'll
complete no problem, with a lovely new .fit file to use.
This isn't a dealbreaker for john, since he's calculating sigs with wndchrm
1.30 and the sigs are using the new naming convention, but this is a regression
nonetheless.
Original comment by christop...@gmail.com
on 18 Mar 2011 at 10:35
The issue with the truncated messages is that the error string runs out of
room, and stops being added to. This is no longer the case with what's in SVN
- I took out all the error functions and put them in a separate file that uses
a strstream, instead of a fixed-length statically allocated string. The stuff
that added things to the error string before was surrounded by checks, and was
tested to not have memory overruns, but something may still have escaped.
The termination may be a red herring - i.e. it looks like termination, but
really its just a terminated error string. The warnings pasted above are not
sent to stderr until just before normal exit (i.e. after the sigs are all dealt
with), so it may well have terminated normally - have all the sig files been
renamed or not?
Since the fit file was zero-length, the code monitoring the size of the error
string may not have been perfect. Or alternatively, there may still be a
problem unrelated to the error string. At least in SVN, the error string
should no longer be an issue.
Original comment by i...@cathilya.org
on 20 Mar 2011 at 7:57
Original comment by i...@cathilya.org
on 15 Apr 2011 at 7:01
Original issue reported on code.google.com by
christop...@nih.gov
on 18 Mar 2011 at 3:30