ryanbressler / golem

Research Oriented Distributed Computing.
1 stars 1 forks source link

Parsing errors on master log #8

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
see attached file... these errors are continuously coming out in the logs.

Original issue reported on code.google.com by hrovira.isb on 17 Jun 2011 at 11:15

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by hrovira.isb on 17 Jun 2011 at 11:15

GoogleCodeExporter commented 9 years ago
It looks like this is coming from inside the json.Decoder. Any suggestions on a 
simple job that will reproduce it?

Original comment by ryanbres...@gmail.com on 18 Jun 2011 at 12:23

GoogleCodeExporter commented 9 years ago
I moved creation of the decoder inside of the for loop in Connection.go  
GetMsgs ... this should keep the buffer nice and small and fix this issue.

Original comment by ryanbres...@gmail.com on 18 Jun 2011 at 4:31

GoogleCodeExporter commented 9 years ago
I think we can reproduce by running a bash script that fails to initialize a 
variable before its referenced... I suspect that this may have caused the golem 
to go into weird state.

Original comment by hrovira.isb on 19 Jun 2011 at 5:53

GoogleCodeExporter commented 9 years ago
Interesting... lets give it a try today. The buffer size  error was coming from 
the bufio calls used by json decoder I had wrapped the websocket in so I am 
suspecting that this error just showed up once the cluster had had enough jobs 
run through it without a restart to fill up the buffer on that decoder. 

I moved the decoder initialization so that it gets reinitialized for each 
message (instead of once on websocket connection) which should fix the issue if 
that is the case...we should swamp the cluster with motif and rf jobs again 
overnight.

Original comment by ryanbres...@gmail.com on 20 Jun 2011 at 4:26