kzwkt / wnd-charm

Automatically exported from code.google.com/p/wnd-charm
0 stars 0 forks source link

wndchrm terminates inexplicably #24

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

Wndchrm terminates inexplicably when operating on a file hierarchy of tiffs and 
sig files using the old style naming convention. 

Steps to reproduce, output with error, and files related to issue to follow in 
later post.

Original issue reported on code.google.com by christop...@nih.gov on 18 Mar 2011 at 3:30

GoogleCodeExporter commented 9 years ago
Is this the issue Mark was having yesterday afternoon with multiple instances?
Do all but one instance terminate?
If so, this is a bug I introduced while checking consistency of feature 
numbers, which I subsequently fixed.

Original comment by i...@cathilya.org on 18 Mar 2011 at 4:22

GoogleCodeExporter commented 9 years ago
No, this is not a multiple instance issue.

I created a tarball with all the files needed to reproduce this problem, and 
put it in /iicbu/Labstuff/issue_24 (212Megs.. big!)

I caught this bug while fixing the wndchrm experiment script for John's 
pairwise stuff. It has to do with the conversion of .sig files named using the 
naked pre-1.30 convention to the current naming convention.

To reproduce:

1. Untar the tarball, and run the command in the unzipped directory.
wndchrm train -m -S500 -t4 `pwd` whatever.fit

This bug consistently occurs on lgchrm17, whether or not you use the new -O 
switch, which skips the check to see if the sigs match. However on my Mac OS 
10.6, the bug only occurred when using -O, and not when wndchrm was allowed to 
reprocess the sigs.

2. On lgchrm17, the signatures will match just fine, and wndchrm will start 
renaming the sig files accordingly. On my intel OS 10.6 laptop, the signatures 
will not match, and wndchrm will start calculating new sigs (this in itself is 
not unexpected, but it's still really a bug that should be investigated). Note 
that there are a ton of files here and it takes quite a while for wndchrm to 
perform the signature matching check (thus the -O switch)

3. Wndchrm will be spitting out lots of output about converting old signature 
files, but then it'll just terminate, with the output being truncated. I 
copy/pasted the last three lines of output below:
...
Old signature file 
'/home/colettace/projects/delaney_segfault/coletta_test/001vs002_Cdx2-D_VS_Dlx3-
D/Cdx2+D/J0022G01-C02_3_0.sig' converted to 
'/home/delaneyjd/ESC_Analysis/AutomatedAnalysis/ScottyRepeat-1/Cdx2Dox+/J0022G01
-C02-S500-t4_3_0.sig' with 1025 features.
Old signature file 
'/home/colettace/projects/delaney_segfault/coletta_test/001vs002_Cdx2-D_VS_Dlx3-
D/Cdx2+D/J0022G01-C02_3_1.sig' converted to 
'/home/delaneyjd/ESC_Analysis/AutomatedAnalysis/ScottyRepeat-1/Cdx2Dox+/J0022G01
-C02-S500-t4_3_1.sig' with 1025 features.
Old signature file 
'/home/colettace/projects/delaney_segfault/coletta_test/001vs002_Cdx2-D_VS_Dlx3-
D/Cdx2+D/J0022G01-C02_3_2.sig' converted to 
'/home/delaneyjd/ESC_Analysis/AutomatedAnalysis/ScottyRepeat-1/Cdx2Dox+/J0022G01
-C02-S500-t4_3_2.s

Note that the third line is missing the last few characters in the file name, 
as well the "with 1025 features" part of the string. This happens consistently 
at the same place each run through.

The .fit file associated with this run will be left there in the directory with 
size 0. I ran it through the debugger, and gdb said that the process exited 
normally. Weird! Indicates a memory overflow issue, no?

4. So, the workaround here is simply to hit up in the shell and re-run the same 
command, wndchrm train -m -S500 -t4 `pwd` whatever.fit, and this time it'll 
complete no problem, with a lovely new .fit file to use.

This isn't a dealbreaker for john, since he's calculating sigs with wndchrm 
1.30 and the sigs are using the new naming convention, but this is a regression 
nonetheless.

Original comment by christop...@gmail.com on 18 Mar 2011 at 10:35

GoogleCodeExporter commented 9 years ago
The issue with the truncated messages is that the error string runs out of 
room, and stops being added to.  This is no longer the case with what's in SVN 
- I took out all the error functions and put them in a separate file that uses 
a strstream, instead of a fixed-length statically allocated string.  The stuff 
that added things to the error string before was surrounded by checks, and was 
tested to not have memory overruns, but something may still have escaped.

The termination may be a red herring - i.e. it looks like termination, but 
really its just a terminated error string.  The warnings pasted above are not 
sent to stderr until just before normal exit (i.e. after the sigs are all dealt 
with), so it may well have terminated normally - have all the sig files been 
renamed or not?

Since the fit file was zero-length, the code monitoring the size of the error 
string may not have been perfect.  Or alternatively, there may still be a 
problem unrelated to the error string.  At least in SVN, the error string 
should no longer be an issue.

Original comment by i...@cathilya.org on 20 Mar 2011 at 7:57

GoogleCodeExporter commented 9 years ago

Original comment by i...@cathilya.org on 15 Apr 2011 at 7:01