Closed EricR86 closed 8 years ago
Original comment by Michael Hoffman (Bitbucket: hoffman, GitHub: michaelmhoffman).
Original comment by Rachel Chan (Bitbucket: rcwchan).
The arguments for loadAccRange are a comma separated list, and are not sorted for minibatch jobs. It looks like segway cuts off reading the list partway for some reason (despite the correct arguments being written to run.sh and details.sh). The issue is not argument length, as much longer arguments have been passed in other runs. It looks like this is causing a Range::Parse error I've been trying to solve and it's possible it's causing this error as well.
Original comment by Rachel Chan (Bitbucket: rcwchan).
Just confirmed that they are the same issue, as I suspected. The parameter passed to SGE is cut off partway through the list of windows. The number of numbers is not constant, but the number of characters without spaces appears to be constant, at 1024. Cutting off in the middle of the window list, say, "12,13,14", can cause either the Range::Parse error (ie, "12,13,") or this missing acc error (ie, "12,13,1" and 1 is not a valid window)
Original comment by Rachel Chan (Bitbucket: rcwchan).
This is caused by issue 70 (#72) and resolved in pull request #55.
Original report (BitBucket issue) by Rachel Chan (Bitbucket: rcwchan).
Upon bundling, segway tries to reference an accumulator file for a run that was never queued, in minibatch. I have reproduced this twice now. In both instances, segway attempted to look for an accumulator file that did not exist, and then errored out with the following:
acc.0.755.bin does not exist in the accumulators folder. According to the jobs.tab, the job for the 755 window was never run, and according to the train log, it was never queued (which means minibatch did not choose it).
Current theory is that minibatch does not choose this window, but for some reason, it gets chosen when the bundling job is run, and then segway cannot find it, so errors out.