repetier / Repetier-Firmware

Firmware for Arduino based RepRap 3D printer.
813 stars 734 forks source link

Printer goes idle mid print #72

Open Highcooley opened 11 years ago

Highcooley commented 11 years ago

Sorry, me again :-)

I had this issue twice and another RAMBO / Rostock Max user reported the same. After a couple of hours of printing (2-5 h), the printer stops mid print and goes idle. All heaters stay on and Repetier host considers the job to be done (no errors & idle).

The way I see it, there could be two explanations. One: Neither the firmware nor the host reports any error if the USB connection gets lost. Both just reconnect and the host quits the print job. Two: Somehow, the firmware starts running through the G-Code without executing it at max speed, until the host sent the last line. Then both, the firmware and the host consider the job as done.

repetier commented 11 years ago

If you have such errors enable logging. So you can check afterwards what happend. If the firmware did a reset you can see a "start" somewhere in the log. But in this case I wouldn't expect the heaters to be still on. After a reset of the board they are off until said by the host. That could be explained if the host received a "start" that didn't come from a reset. Not sure how this could be done.

The log should also show if it executed until the end. It the moment i have no idea what could cause this and leave both connected and still responsive as I understand. If you see something in the log let me know. Also it may take some time til I respond (holiday).

Highcooley commented 11 years ago

Ok, one last question...where and how to enable logging? Firmware ore just "echo" in host debug options?

Enjoy your hols!!

repetier commented 11 years ago

I mean logging in the host. Config- Repetier settings. This stores a log with all communication in the workdir. Every restart of the host deletes the old log, so save it if needed.

Highcooley commented 11 years ago

Ok, last time when printing from the PC, I activated the log. Unfortunately, this only worked after restart of the host. So, I wasn't able to log the fail. However, I checked the last lines of the communication protocol and there is nothing unusual. The host sends the last GCODE lines and waits for the firmware to request the next lines, which never happens. So, the host stays in print mode and keeps counting the time till print end. On the other side, the firmware stops moving the motors, shows idle, keeps the temperatures set and responds to any command through the LCD controller input.

I've got four aborted prints now. Three of them are from the same KISSLICER GCode. All three prints aborted during layer 90. The last print was from the SD card, so the USB definitely didn't cause this. The PC was off anyway, so there is no possibility, that there was a faulty code sent over USB. Looking at the GCODE, I cannot find any strange commands either. There are only FAN on/off, G1 positioning, extruding, retracting and motor speed setting commands, next to ";" comment lines.

The only explanation which remains is, that it could be some stack overflow problem. The 90th layer starts on line 897'944 in the code. This doesn't ring a bell concerning stack overflow due to max variable ranges. However, this is GCODE including comment lines. I don't know how many lines there would be without empty and comment lines.

The GCODE can be found in this forum post for inspection: http://forum.seemecnc.com/viewtopic.php?f=54&t=1046&start=60#p5501

luke321 commented 11 years ago

I had the same thing happening yesterday with my Rostock printer with a Megatronics 1.0 board maybe the Delta settings are the cause? I just switched to repetier so I'll need some time to confirm this.

luke321 commented 11 years ago

I succesfully printed some parts now without the printer going idle, but sometimes the print head stops for some milliseconds on edges or when changing the perimiter line, is this buffer related?

repetier commented 11 years ago

I do not thimk it is an empty buffer. You cannot see that as the next segment is normally send to quick to register. From your description i guess it is the retraction of the extruder that interrupts. This is correct!

luke321 commented 11 years ago

It happens when the printer should jump from the first to the second and second to third perimeter, OPS is disabled and minimal retraction travel in slic3r is 2mm, jerk is set to 20mm/s and on Marlin firmware it prints just fine.

repetier commented 11 years ago

@luke321 You are talking about the pauses, right? Between layer swtiches the slicer inserts retractions which take some time. Watch your extruder for retractions in that moment and you see what i mean. Only after retraction is finished the head will go up. Marlin does the same, because the g code contains the commands to do this.

luke321 commented 11 years ago

No I don't mean retraction pauses, retraction is working fine for me. It happens when changing the perimeter line sometimes already in the skirt: stop

for example:

G1 X-13.150 Y-11.800 E6.88994 G1 X-12.830 Y-12.150 E6.90891 G1 X-12.382 Y-12.589 E6.93402 G1 X-12.330 Y-12.640 E6.93691

G1 X-12.010 Y-12.290 F18000.000 before doing this short travel move the printer waits and oozes a lot

G1 X-11.290 Y-12.950 F720.000 E6.97598 G1 X-10.580 Y-13.540 E7.01290 G1 X-10.070 Y-13.920 E7.03834 G1 X-9.260 Y-14.470 E7.07751 G1 X-8.770 Y-14.770 E7.10049

luke321 commented 11 years ago

I tracked it down it only happens if I adjust the Flowrate with Repetier-Host.

polygonhell commented 11 years ago

I just verified the same issue as the OP, running his GCode as a dryrun, on my printer it stops around line 802985 which is layer 84. The firmware doesn't reset, it's still running, but it's unresponsive to comms, it won't even echo commands back. The LCD is still responsive, so I assume there is an issue where gcode_next_command is retuning NULL or similar. Since it uses much of the same path I tried writing the GCode to an SDCard that seems to lose the connection much faster as early as line 2700, it looks like there is a comms error and it never correctly recovers, several seconds late in this case the host stops trying to resend.

polygonhell commented 11 years ago

The following is repeated many times at the end of the log, looks like the command has the wrong checksum and it never recovers.

< 2:32:59 PM: Echo: < 2:32:59 PM: Echo:

2:32:59 PM: N2723 G1 X21.11 Y5.32 E135.3127 113 < 2:32:59 PM: ok 2698 < 2:32:59 PM: Echo: < 2:32:59 PM: Echo: < 2:32:59 PM: Error:Binary cmd wrong checksum. < 2:32:59 PM: Resend:2699 2:32:59 PM: Resend: N2699 G1 E133.1223 68 2:32:59 PM: Resend: N2700 G1 F806.00 75 2:32:59 PM: Resend: N2701 G1 X24.37 Y10.34 E133.2856 74 < 2:32:59 PM: ok 2:32:59 PM: Resend: N2702 G1 X23.88 Y10.17 E133.3076 64 < 2:32:59 PM: ok 2698 < 2:32:59 PM: Echo: < 2:32:59 PM: Echo: 2:32:59 PM: Resend: N2703 G1 X26.00 Y7.00 E133.4699 116 < 2:32:59 PM: ok 2698 < 2:32:59 PM: Echo: < 2:32:59 PM: Echo: 2:32:59 PM: Resend: N2704 G1 X25.51 Y6.83 E133.4919 121 < 2:32:59 PM: ok 2698 < 2:32:59 PM: Echo: < 2:32:59 PM: Echo: 2:32:59 PM: Resend: N2705 G1 X23.39 Y10.00 E133.6542 76 < 2:32:59 PM: ok 2698 < 2:32:59 PM: Echo:G0 < 2:32:59 PM: Echo:G0 2:32:59 PM: Resend: N2706 G1 X22.90 Y9.83 E133.6762 126 < 2:32:59 PM: ok 2698 < 2:32:59 PM: Echo: < 2:32:59 PM: Echo: 2:32:59 PM: Resend: N2707 G1 X25.02 Y6.67 E133.8384 116 < 2:32:59 PM: ok 2698 < 2:32:59 PM: Echo:G0 R0.00 < 2:32:59 PM: Echo:G0 R0.00 2:32:59 PM: Resend: N2708 G1 X24.53 Y6.50 E133.8604 119 < 2:32:59 PM: ok 2698 < 2:32:59 PM: Echo: < 2:32:59 PM: Echo: 2:32:59 PM: Resend: N2709 G1 X22.41 Y9.67 E134.0228 125 < 2:32:59 PM: ok 2698 < 2:32:59 PM: Echo:G0 < 2:32:59 PM: Echo:G0 2:32:59 PM: Resend: N2710 G1 X21.92 Y9.50 E134.0447 115 < 2:32:59 PM: ok 2698 < 2:32:59 PM: Echo: < 2:32:59 PM: Echo: 2:32:59 PM: Resend: N2711 G1 X24.05 Y6.33 E134.2071 112 < 2:32:59 PM: ok 2698 < 2:32:59 PM: Echo: E0.0000 < 2:32:59 PM: Echo: E0.0000 2:32:59 PM: Resend: N2712 G1 X23.56 Y6.16 E134.2291 121 < 2:32:59 PM: ok 2698 < 2:32:59 PM: Echo: < 2:32:59 PM: Echo: 2:32:59 PM: Resend: N2713 G1 X21.43 Y9.33 E134.3913 118 < 2:32:59 PM: ok 2698 < 2:32:59 PM: Echo: < 2:32:59 PM: Echo: 2:32:59 PM: Resend: N2714 G1 X20.95 Y9.16 E134.4133 *113 < 2:32:59 PM: ok 2698 < 2:32:59 PM: Echo: < 2:32:59 PM: Echo:

repetier commented 11 years ago

To me it looks like the error hots the firmware out of sync. I guess it detected an ascii command while getting binary data. As you see the line does not increase indicating missing checksum and no resend requests, which is only possible in ascii mode, You also see the echo commands don't reflect the commands send. The big question is how a resend data can cause coming out of sync. I think this requires at least an additional transfer error for the first byte which determines if it is ascii or binary (>128 binary) Can you mail me the gcode part in question. Woud like to see if this is possible with the binary data and hopefully see a solution to detect that problem. I guess i could inforce checksums, making any non checksummed command invalid causing a resend.

Highcooley commented 11 years ago

This is the GCODE in question: http://forum.seemecnc.com/download/file.php?id=760 So far, three people tried to run it on different Rostock Max printers with RAMBO boards. They all have the same problem around layer 80-90. However, shortening the GCODE and starting e.g. from layer 79 lets the printer run through these layers without a flaw.

polygonhell commented 11 years ago

That makes some sense if it were interpreting a binary command as ASCII. using a single bit to determine ASCII/Binary seems a little dangerous on an unreliable protocol, one option could be to have some bit pattern that explicitly swaps from Binary to ASCII and back, but then you have state in the protocol and the chance to get out of sync. It's also a little odd that 3 people get the same issue 800K lines into the same print within about 50K lines of each other, though that could be something odd in the USB Serial stack I guess.

polygonhell commented 11 years ago

I tried forcing ASCII commands and uploaded the file to the SDCard and it succeeds albeit in a little over 4 and a half hours, so I think it's very likely related to the binary protocol. It's worth noting that error rates were a lot higher than I expected judging by the red text flying by in the terminal window, and seemed to be fairly clumpy, so perhaps there is something in the data pattern that's makes errors more common at certain points in the GCode, which is causing the failure in close proximity for the 3 of us who've tried it.

repetier commented 11 years ago

More errors in ascii protocol make sense, as it needs the double size making it more likely. From your description I take that you get transfer errors on a quite regular basis. I had a board where I also got many error, but only with 115200 baud. With 250000 the errors were gone. One reason that also could be is the dtr signal being high. You could test a emergency stop after open, which will toggle dtr to low. Perhaps this reduces the error rate, also there should be no connection, but I have a user having this problem since 0.80 and that is where I added the dtr toggle.

polygonhell commented 11 years ago

I just tried this and yes hitting estop after open appears to reduce the error rate. Bizarre. It does seem though that since the ASCII upload succeeds, the original issue is probably the Binary protocol not being robust enough to handle a significant error rate

repetier commented 11 years ago

Well the dtr thing is really bizarre, but ok. Knowing that I will set it back to low on open in the next version. The binary protocol is quite stable but has one weak point - detecting of a binary command is based on one bit in the first byte. So if exactly this bit is affected, you get problems. I'm still thinking on how to improve detection. One solution is to require a checksum, except for M117, when previous commands hat a checksum. That way a resend would be triggered even if it is wrongly taken as ascii command. The host always inserts checksums and if someone starts with a simple terminal he will most likely not send checksums.

polygonhell commented 11 years ago

That seems like a reasonable solution, the other option would be to add an MCODE to forcibly set the binary protocol on or off. You'd have to have some way of still reading the ASCII version of that MCODE, so a user could reset it.

repetier commented 11 years ago

Ok, as a first solution I have added the forced checksum part to the current version. Simply add

/* If a checksum is send, all future comamnds must also contain a checksum. Increases reliability especially for binary protocol. /

define FEATURE_CHECKSUM_FORCED true

in the config. With true it is enabled. Hope that it helps, but my printer have rarely errors, so only time can tell.

polygonhell commented 11 years ago

I'll need to merge this into my branch, I'll try and do it later today and let you know. FWIW with the DTE held low fix I transferred >2Million lines of GCode to the SDCard without a single error, so I can see how this could be hard to repro.

repetier commented 11 years ago

So the DTR signal seems to have a real effect. For the next release I already changed DTR to low after connect hoping there are no boards getting more errors on a low signal.

TinHead commented 11 years ago

Hi,

I'm the one who had problems on all versions since 0.74, I tried the DTR switch and by magic the bad checksums and resends are gone. :)

Can't reply on the reprap forums as it's down currently.

repetier commented 11 years ago

I have already changed the sinal to off for the next release. That way no one needs to kill on connect :-) I would still like to know why this makes a difference, but at the end the result counts.

artbotterell commented 11 years ago

Hi...

OK, I just developed what seems like this same problem. I'm using Firmware 0.82, freshly downloaded, which appears to have the DTR-ignore function pre-set... and Host 0.85b on Win 7. Our Leapfrog Creatr goes through the usual start-of-job moves, proceeds to the location where it would start extruding the skirt, and then comes quietly to a halt.

EXCEPT... it doesn't do that if I'm in dry run mode. Almost as though the first command to the extruder is triggering the problem. However, I can manually extrude without causing any problem.

Once the printer freezes I can hit the Kill Job button, which seems to change color as expected, but the printer doesn't respond by parking as it should, and the temperature chart is also frozen. And I can't get back manual control without disconnecting and reconnecting. When I reconnect, the temp charts show a linear drop to zero across the time since the freeze, but then rebound to more believable numbers on the next update.

Log file likewise halts abruptly:

21:45:29.372 : FIRMWARE_NAME:Repetier_0.82 FIRMWARE_URL:https://github.com/repetier/Repetier-Firmware/ PROTOCOL_VERSION:1.0 MACHINE_TYPE:Mendel EXTRUDER_COUNT:2 REPETIER_PROTOCOL:2 < 21:45:29.372 : N5 M105 2 < 21:45:29.373 : N6 M220 S100 71 21:45:29.381 : Printed filament:14.22 m 21:45:29.381 : Printing time:0 days 4 hours 44 min 21:45:29.381 : ok 2 21:45:29.381 : ok 3 21:45:29.381 : T:0.00 B:0.00 @:0 T0:0.00 @0:0 T1:0.00 @1:0 < 21:45:29.381 : N7 M221 S100 71 < 21:45:29.382 : N8 M111 S6 79 21:45

Same behavior on three different objects, none of which are particularly complex.

Any guidance would be greatly appreciated! Thanks!

repetier commented 11 years ago

Looks like a very repeatable problem for you. Can you mail me the zipped firmware with your configuration. Perhaps I can reproduce it too, which would help finding the problem.

artbotterell commented 11 years ago

My firmware source and settings (I've done some in the EEPROM from Host) are attached. Hope that helps, and thanks for the great packages!

On Tue, Apr 9, 2013 at 12:02 AM, repetier notifications@github.com wrote:

Looks like a very repeatable problem for you. Can you mail me the zipped firmware with your configuration. Perhaps I can reproduce it too, which would help finding the problem.

— Reply to this email directly or view it on GitHubhttps://github.com/repetier/Repetier-Firmware/issues/72#issuecomment-16097365 .

Art Botterell Disaster Management Consultant Associate Director, Disaster Management Initiative Carnegie Mellon University Silicon Valley NASA Research Park, Building 23 (MS 23-11) Moffett Field, CA 94035-0001 office (650) 335-2875 09:34:47.599 : FIRMWARE_NAME:Repetier_0.82 FIRMWARE_URL:https://github.com/repetier/Repetier-Firmware/ PROTOCOL_VERSION:1.0 MACHINE_TYPE:Mendel EXTRUDER_COUNT:2 REPETIER_PROTOCOL:2 09:34:47.599 : Printed filament:14.22 m 09:34:47.599 : Printing time:0 days 4 hours 44 min artbotterell EEPROM settings 9 Apr 2013

09:35:20.703 : EPR:2 75 115200 Baudrate 09:35:20.703 : EPR:3 129 14.22 Filament printed [m] 09:35:20.718 : EPR:2 125 17062 Printer active [s] 09:35:20.718 : EPR:2 79 0 Max. inactive time [ms,0=off] 09:35:20.734 : EPR:2 83 120000 Stop stepper after inactivity [ms,0=off] 09:35:20.749 : EPR:3 3 33.60 X-axis steps per mm 09:35:20.749 : EPR:3 7 33.60 Y-axis steps per mm 09:35:20.765 : EPR:3 11 1070.50 Z-axis steps per mm 09:35:20.781 : EPR:3 15 200.00 X-axis max. feedrate [mm/s] 09:35:20.796 : EPR:3 19 200.00 Y-axis max. feedrate [mm/s] 09:35:20.812 : EPR:3 23 5.00 Z-axis max. feedrate [mm/s] 09:35:20.827 : EPR:3 27 80.00 X-axis homing feedrate [mm/s] 09:35:20.843 : EPR:3 31 80.00 Y-axis homing feedrate [mm/s] 09:35:20.859 : EPR:3 35 3.00 Z-axis homing feedrate [mm/s] 09:35:20.890 : EPR:3 39 20.00 Max. jerk [mm/s] 09:35:20.905 : EPR:3 47 0.50 Max. Z-jerk [mm/s] 09:35:20.937 : EPR:3 133 0.00 X home pos [mm] 09:35:20.968 : EPR:3 137 0.00 Y home pos [mm] 09:35:20.983 : EPR:3 141 0.00 Z home pos [mm] 09:35:21.015 : EPR:3 145 230.00 X max length [mm] 09:35:21.046 : EPR:3 149 270.00 Y max length [mm] 09:35:21.077 : EPR:3 153 180.00 Z max length [mm] 09:35:21.108 : EPR:3 157 0.00 X backlash [mm] 09:35:21.139 : EPR:3 161 0.00 Y backlash [mm] 09:35:21.139 : EPR:3 165 0.00 Z backlash [mm] 09:35:21.139 : EPR:3 51 500.00 X-axis acceleration [mm/s^2] 09:35:21.139 : EPR:3 55 500.00 Y-axis acceleration [mm/s^2] 09:35:21.155 : EPR:3 59 100.00 Z-axis acceleration [mm/s^2] 09:35:21.155 : EPR:3 63 500.00 X-axis travel acceleration [mm/s^2] 09:35:21.155 : EPR:3 67 500.00 Y-axis travel acceleration [mm/s^2] 09:35:21.155 : EPR:3 71 100.00 Z-axis travel acceleration [mm/s^2] 09:35:21.155 : EPR:0 103 1 OPS operation mode [0=Off,1=Classic,2=Fast] 09:35:21.171 : EPR:3 99 50.00 OPS move after x% retract [%] 09:35:21.171 : EPR:3 43 0.80 OPS min. distance for fil. retraction [mm] 09:35:21.171 : EPR:3 87 1.50 OPS retraction length [mm] 09:35:21.171 : EPR:3 91 0.00 OPS retraction backlash [mm] 09:35:21.171 : EPR:0 106 0 Bed Heat Manager [0-2] 09:35:21.171 : EPR:0 107 255 Bed PID drive max 09:35:21.171 : EPR:0 124 255 Bed PID drive min 09:35:21.171 : EPR:3 108 196.00 Bed PID P-gain 09:35:21.171 : EPR:3 112 33.02 Bed PID I-gain 09:35:21.171 : EPR:3 116 290.00 Bed PID D-gain 09:35:21.171 : EPR:0 120 255 Bed PID max value [0-255] 09:35:21.171 : EPR:3 200 50.00 Extr.1 steps per mm 09:35:21.171 : EPR:3 204 15.00 Extr.1 max. feedrate [mm/s] 09:35:21.171 : EPR:3 208 10.00 Extr.1 start feedrate [mm/s] 09:35:21.171 : EPR:3 212 4000.00 Extr.1 acceleration [mm/s^2] 09:35:21.171 : EPR:0 216 1 Extr.1 heat manager [0-1] 09:35:21.171 : EPR:0 217 140 Extr.1 PID drive max 09:35:21.171 : EPR:0 245 60 Extr.1 PID drive min 09:35:21.171 : EPR:3 218 6.16 Extr.1 PID P-gain 09:35:21.171 : EPR:3 222 0.37 Extr.1 PID I-gain 09:35:21.171 : EPR:3 226 25.58 Extr.1 PID D-gain 09:35:21.171 : EPR:0 230 255 Extr.1 PID max value [0-255] 09:35:21.171 : EPR:2 231 0 Extr.1 X-offset [steps] 09:35:21.171 : EPR:2 235 0 Extr.1 Y-offset [steps] 09:35:21.186 : EPR:1 239 1 Extr.1 temp. stabilize time [s] 09:35:21.186 : EPR:1 250 150 Extr.1 temp. for retraction when heating [C] 09:35:21.186 : EPR:1 252 0 Extr.1 distance to retract when heating [mm] 09:35:21.186 : EPR:0 254 255 Extr.1 extruder cooler speed [0-255] 09:35:21.186 : EPR:3 246 0.00 Extr.1 advance L [0=off] 09:35:21.186 : EPR:3 300 50.00 Extr.2 steps per mm 09:35:21.186 : EPR:3 304 15.00 Extr.2 max. feedrate [mm/s] 09:35:21.186 : EPR:3 308 10.00 Extr.2 start feedrate [mm/s] 09:35:21.186 : EPR:3 312 4000.00 Extr.2 acceleration [mm/s^2] 09:35:21.186 : EPR:0 316 1 Extr.2 heat manager [0-1] 09:35:21.186 : EPR:0 317 130 Extr.2 PID drive max 09:35:21.186 : EPR:0 345 60 Extr.2 PID drive min 09:35:21.186 : EPR:3 318 6.16 Extr.2 PID P-gain 09:35:21.186 : EPR:3 322 0.37 Extr.2 PID I-gain 09:35:21.186 : EPR:3 326 25.58 Extr.2 PID D-gain 09:35:21.186 : EPR:0 330 255 Extr.2 PID max value [0-255] 09:35:21.186 : EPR:2 331 10 Extr.2 X-offset [steps] 09:35:21.186 : EPR:2 335 0 Extr.2 Y-offset [steps] 09:35:21.186 : EPR:1 339 1 Extr.2 temp. stabilize time [s] 09:35:21.186 : EPR:1 350 150 Extr.2 temp. for retraction when heating [C] 09:35:21.186 : EPR:1 352 40 Extr.2 distance to retract when heating [mm] 09:35:21.186 : EPR:0 354 255 Extr.2 extruder cooler speed [0-255] 09:35:21.186 : EPR:3 346 0.00 Extr.2 advance L [0=off]

repetier commented 11 years ago

Looks like the source is still missing. You can send it via email to me. Not sure if you can attach files in github issues.

artbotterell commented 11 years ago

What's an alternative email for you?

On Tue, Apr 9, 2013 at 10:01 AM, repetier notifications@github.com wrote:

Looks like the source is still missing. You can send it via email to me. Not sure if you can attach files in github issues.

— Reply to this email directly or view it on GitHubhttps://github.com/repetier/Repetier-Firmware/issues/72#issuecomment-16125846 .

Art Botterell Disaster Management Consultant Associate Director, Disaster Management Initiative Carnegie Mellon University Silicon Valley NASA Research Park, Building 23 (MS 23-11) Moffett Field, CA 94035-0001 office (650) 335-2875

repetier commented 11 years ago

repetierdev@gmail.com

limited660 commented 11 years ago

I have having this same issue but with printing from SD using an LCD and no computer. Sometimes it stops right after starting the print (seconds) and goes idle, I restarted and it made it 96% into the print and went idle. It stops wherever it was sitting and all heaters stay on. This has happened about 4 times since I turned on sdcard support last night.

repetier commented 11 years ago

@limited660 This talk was about printing from host. Printing from sd card is a completely different issue and normally has a hardware reason. You could try different sd cards as a start.

limited660 commented 11 years ago

I will give another SD card a try, if problem persists I will open a different issue for that.

kyrreaa commented 11 years ago

I get random lockups too. I thought it was related to my usb->serial dongle, however pressing reset on my avr controller makes it respond again. It's happened quite far into a print which is really annoying.

Wish repetier had a "start from layer n" option where it would read the gcode up to first layer and then skip down to the n'th layer. That would allow restarting jobs after such crash and saving hours of printing.

luke321 commented 11 years ago

@kyrreaa I had such lockups twice in an 12 hour print, it was with an older firmware version altough... I went trough the log, searched for the last line of gcode and the last z height. Than simply removed everything else and resumed printing. altough this could be problematic with none rostock printers, because of the home positions.

ellindsey commented 11 years ago

I am having this problem repeatably.

I an running Repetier 0.83 on a RAMPS 1.4 board, and Repetier-server on a Raspberry Pi as my print server. The printer is a delta, custom made Rostock derived design.

I have a file which consistently locks up at 73% completion. I have run the same print three times now and it fails at about the same point, although not at the exact same line on the G-code. Checking the log (shown on the repetier-server browser window) the last line sent isn't exactly the same each time it fails.

When it fails, the printer LCD shows the printer to be in idle mode. The heaters are on and maintaining temperature, but the temperature is not being reported to the server. I can cancel the print on the server, but the RAMPS board doesn't respond to any commands through the USB port. I need to do a hardware reset to get it talking again.

Communication is through binary mode. I have tried to force it to ascii mode, but changing the mode in the repetier-server config file has no effect, it always goes to binary mode no matter what I set.

I haven't yet tried printing this file from my laptop instead of the Raspberry Pi. I don't have Repetier-host installed yet, but I can try that tomorrow.

I don't have a SDcard reader attached to the RAMPS board, so I can't try printing from the card.

I can upload the gcode file if it will help.

This problem first appeared after I tore down and rebuilt the hot end to use a 0.35mm nozzle. That shouldn't cause any problems itself, but I did significantly change my slicer settings at about the same time, reducing layer height and enabling microlayering. I can try slicing the same part without microlayering and see if it works then.

repetier commented 11 years ago

That error is always hard to detect. Looks like it is no firmware issue if it is still running. It is only getting no more data. This is often enough a problem with the usb connection. I have by now several usb cables. I know which one to select if I need communication errors and which to use errorless printing. So maybe a different usb cable might help. You want at least shielded usb cables, also there are good and bad shielded cables. You can test the communication itself if you activate debug communication: M111 S20 If it is a Server issue this should already fail, as it has to send the same data.If it is caused by a crashed usb connection due to noise it will most probably work, as the printer has no extra load causing troubles. Then only the receive of a command with correct checksum is tested. No motors/heaters producing noise. Next level would be to increase noise by running in dry mode. Motor noise but no large voltage dropdowns from heater/heated bed.

An other test would be to run it from a laptop/pc. Different usb connectors can also make a difference.

luke321 commented 11 years ago

I noticied that it was a problem with my gap fill speed (40mm/s). I reduced it to 20 mm/s and my printer finished the job.

repetier commented 11 years ago

The big questions is, why do you get no connection problem with 20mm/s infill. Is it because the stepper are slower and the noise level is low enough or is there a different reason. I do my infills with 60mm/s and have no problems (with the right usb cable).

kyrreaa commented 11 years ago

This is the same issue I was verifying, and it is not due to the cable at all. I verified that the rs232 data actually came through but the repetier host is not sending any more as it is throttled by flow control. The firmware thus throttles and then nothing more happens. It also appears it is idling as there is no steps to be taken, but the queue is full or atleast it’s busy/blocked.

The fact that the same file can halt in different places tells me this is a timing issue which suggests there may be a problem with the protocol or the state-machine like buffering system. It may also be a memory overwrite problem.

Some strategic checks and halts could be placed in the code to test and find this, having a proper debugger I can see what’s going on. If you have any ideas to where I should look, or any more you need to know, just ask and I’ll set up a test and find out.

From: repetier Sent: Sunday, July 21, 2013 8:24 AM To: repetier/Repetier-Firmware Cc: kyrreaa Subject: Re: [Repetier-Firmware] Printer goes idle mid print (#72)

That error is always hard to detect. Looks like it is no firmware issue if it is still running. It is only getting no more data. This is often enough a problem with the usb connection. I have by now several usb cables. I know which one to select if I need communication errors and which to use errorless printing. So maybe a different usb cable might help. You want at least shielded usb cables, also there are good and bad shielded cables. You can test the communication itself if you activate debug communication: M111 S20 If it is a Server issue this should already fail, as it has to send the same data.If it is caused by a crashed usb connection due to noise it will most probably work, as the printer has no extra load causing troubles. Then only the receive of a command with correct checksum is tested. No motors/heaters producing noise. Next level would be to increase noise by running in dry mode. Motor noise but no large voltage dropdowns from heater/heated bed.

An other test would be to run it from a laptop/pc. Different usb connectors can also make a difference.

— Reply to this email directly or view it on GitHub.

ellindsey commented 11 years ago

I really don't think it's a noise issue since the behavior is so repeatable. It always happens on certain prints, at about the same point in the print. It's not random, although the exact line the print fails on is random it's always at 73% into the print. That seems to me like a software problem.

I'm going to try printing from the laptop today. I'll also try printing the same part with different speed settings. This print was done with 50mm/s perimeter, 75mm/s infill, and 20mm/s gap fill.

kyrreaa commented 11 years ago

I should also mention that I have reproduced the issue in dry run mode with unpowered steppers. Den 21. juli 2013 16:21 skrev "ellindsey" notifications@github.com følgende:

I really don't think it's a noise issue since the behavior is so repeatable. It always happens on certain prints, at about the same point in the print. It's not random, although the exact line the print fails on is random it's always at 73% into the print. That seems to me like a software problem.

I'm going to try printing from the laptop today. I'll also try printing the same part with different speed settings. This print was done with 50mm/s perimeter, 75mm/s infill, and 20mm/s gap fill.

— Reply to this email directly or view it on GitHubhttps://github.com/repetier/Repetier-Firmware/issues/72#issuecomment-21310719 .

repetier commented 11 years ago

@kyrreaa A timing problem may also be the reason.

This is the same issue I was verifying, and it is not due to the cable at all. I verified that the rs232 data actually >came through but the repetier host is not sending any more as it is throttled by flow control. The firmware thus throttles and then nothing more happens. It also appears it is idling as there is no steps to be taken, but the queue is full or at least it’s busy/blocked.

I'm a bit lost with that problem. I just finished a 4 hour print on the delta from my host and had no hangs. Same with the sample that always causes the hang for you. So I guess my computer has the wrong timing or the firmware. If you can produce it with unpowered steppes, then I should be able to run it on a board only with your settings. Do you have a simple example that normally fails along with your settings? Which firmware/host versions do you use?

kyrreaa commented 11 years ago

I was using the current version when I created the issue. I think 0.82 and host with slicer lacking rafts etc. Try playing with the feed-rate adjustment and use dryrun. Set serial to 115200 and use a 16MHz xtal. Den 21. juli 2013 16:43 skrev "repetier" notifications@github.com følgende:

@kyrreaa https://github.com/kyrreaa A timing problem may also be the reason.

This is the same issue I was verifying, and it is not due to the cable at all. I verified that the rs232 data actually >came through but the repetier host is not sending any more as it is throttled by flow control. The firmware thus throttles and then nothing more happens. It also appears it is idling as there is no steps to be taken, but the queue is full or at least it’s busy/blocked.

I'm a bit lost with that problem. I just finished a 4 hour print on the delta from my host and had no hangs. Same with the sample that always causes the hang for you. So I guess my computer has the wrong timing or the firmware. If you can produce it with unpowered steppes, then I should be able to run it on a board only with your settings. Do you have a simple example that normally fails along with your settings? Which firmware/host versions do you use?

— Reply to this email directly or view it on GitHubhttps://github.com/repetier/Repetier-Firmware/issues/72#issuecomment-21311041 .

ellindsey commented 11 years ago

Here is the file which always fails for me at about 73%:

https://www.dropbox.com/s/deqdm4fyxlotmo5/shoulderblockleft_rostock.gcode

repetier commented 11 years ago

Ok, downloaded the file and removed the first 30000 lines to let it run in dry mode. Could you verify that this still causes the hang, as I can not print external created gcode (wrong extrusions for my printer).

@kyrreaa 115200 baud at 16MHz is exactly what I'm running at. But stepper precision etc. could also influence the hang. My guess is that is starts with an error and then we get perhaps a loop between resend and data resend. Did you use also binary protocol?

kyrreaa commented 11 years ago

The extrusions should be irrelevant in dryrun. Use 300% speed if your printer can keep up. If not, disable stepper power too. Den 21. juli 2013 18:14 skrev "repetier" notifications@github.com følgende:

Ok, downloaded the file and removed the first 30000 lines to let it run in dry mode. Could you verify that this still causes the hang, as I can not print external created gcode (wrong extrusions for my printer).

@kyrreaa https://github.com/kyrreaa 115200 baud at 16MHz is exactly what I'm running at. But stepper precision etc. could also influence the hang. My guess is that is starts with an error and then we get perhaps a loop between resend and data resend. Did you use also binary protocol?

— Reply to this email directly or view it on GitHubhttps://github.com/repetier/Repetier-Firmware/issues/72#issuecomment-21312421 .

repetier commented 11 years ago

Ok, during the second run I got the hang even with the latest development version. Very strange. If I move position manually on the printer interface the printer moves and the new coordinates get send to the host and updated. So the connection is in deed running as it should.

I can fake ok, so the host sends new commands, but the firmware does ignore them silently. So it must be an error somewhere from the input queue down to the execution of commands. With the interface still intact I assume the main loop is still running. Will try to add a new menu command to send me some debugging informations to see what variables are set during the hang. I think the problem is on the firmware side, since I can see the host putting commands into the serial connection. Now I have to figure out why the firmware ignores new input.