prusa3d / Prusa-Firmware-Buddy

Firmware for the Original Prusa MINI, Original Prusa MK4 and the Original Prusa XL 3D printers by Prusa Research.
Other
1.14k stars 221 forks source link

[BUG] Prusa Connect / Link Gcode Corruption on OTA gcode upload #3156

Open jrgiacone opened 1 year ago

jrgiacone commented 1 year ago

Please, before you create a new bug report, please make sure you searched in open and closed issues and couldn't find anything that matches.

Printer type - [MK4]

Printer firmware version - [5.0 Alpha 4]

Original or Custom firmware - [Original]

USB drive or USB/Octoprint USB Drive formatted as FAT32

Describe the bug When uploading gcode via prusa link and prusa connect over the air, gcode corrupts causing toolhead to move outside of path, sometimes it continues causing failed print, or the printer gets stuck and is unable to move.

For example the print head will move to the right or north of the object and pause/get stuck, print is unable to pause or stop.

How to reproduce With alpha 4, upload a gcode file to the printer and check the file that was uploaded to the printer USB in gcode viewer and compare it to an export direct from prusa slicer.

Note if you download the file directly from connect cloud it is correct, the corruption is occurring when uploading the file to the USB over wifi.

Expected behavior Gcode uploaded via wifi should be identical to a direct upload to USB via usb port

G-code Working and Problomatic Gcode.zip

Note the files are the same gcode, 1 is direct from prusa slicer to usb, the other is via prusa link. Please compare the tool path in gcode viewer to see this issue.

Crash dump file Please attach the crash dump file. This will make it easier for us to investigate the bug.

Video Please attach a video. It usually helps to solve the problem.

sjors-lemniscap commented 1 year ago

Experiencing the same issue here. After hours of troubleshooting why one layer of a particular print kept failing I figured that my gcode, uploaded through WiFi, was corrupt.

My setup:

Thanks to jstm88's python script I continued uploading over WiFi and discovered every now and then a file to be corrupted, after deleting the file from the printer and uploading it again it usually validated successfully.

The interesting part is that one particular file (attached the gcode and stl for reference) sheet-holder-hex.zip kept getting corrupted, no matter what I tried (renaming the file, generating the gcode again etc.). I decided to reformat my Samsung USB drive and after the first try it validated and processed the file successfully.

Considering that; the Samsung Fit 256GB MUF-256AB/APC is a relatively fast USB drive, and others that are using a relatively new Samsung USB drive seem to experience similar behaviour, perhaps the issue could be that issue might be with the transfer speed of the USB drive. Just wanted to share my thoughts and experience here. I will purchase / try another (slower) USB drive to see if that resolves the issue for now.

rchiechi commented 1 year ago

Experiencing the same issue here. After hours of troubleshooting why one layer of a particular print kept failing I figured that my gcode, uploaded through WiFi, was corrupt.

My setup:

* Prusa MK4

* Firmware 5.0.0-RC

* Samsung Fit 256GB ‎MUF-256AB/APC

* USB formatted as FAT32 on MacOS Ventura 13.4.1 (c) with Master Boot Record partition scheme

* Printer connected over WiFi, Access Point is on the other side of the wall (facing backwards) of where the printer is positioned

The interesting part is that one particular file (attached the gcode and stl for reference) sheet-holder-hex.zip kept getting corrupted, no matter what I tried (renaming the file, generating the gcode again etc.). I decided to reformat my Samsung USB drive and after the first try it validated and processed the file successfully.

I have had exactly this experience and am using identical hardware/firmware, but with printer on Ethernet. I even had a gcode that reliably triggered the bug until reformatting the USB drive!

sjors-lemniscap commented 1 year ago

Experiencing the same issue here. After hours of troubleshooting why one layer of a particular print kept failing I figured that my gcode, uploaded through WiFi, was corrupt. My setup:

* Prusa MK4

* Firmware 5.0.0-RC

* Samsung Fit 256GB ‎MUF-256AB/APC

* USB formatted as FAT32 on MacOS Ventura 13.4.1 (c) with Master Boot Record partition scheme

* Printer connected over WiFi, Access Point is on the other side of the wall (facing backwards) of where the printer is positioned

The interesting part is that one particular file (attached the gcode and stl for reference) sheet-holder-hex.zip kept getting corrupted, no matter what I tried (renaming the file, generating the gcode again etc.). I decided to reformat my Samsung USB drive and after the first try it validated and processed the file successfully.

I have had exactly this experience and am using identical hardware/firmware, but with printer on Ethernet. I even had a gcode that reliably triggered the bug until reformatting the USB drive!

Interesting.. long story short I feel like (and as mentioned before by @jstm88) this is rather an issue of how the Prusa buddy board of the mk4 reads from the buffer and writes it to the USB, where it looks like that a fast USB drive is causing a race condition... Even though it still wouldn't explain why after formatting the exact same USB drive the issue didn't occur (so far). I'm curious whether the issue will slowly start to re-occur again the more I write to the USB drive.

We have at this point pretty much ruled out that the issue is specific to the WiFi network stack as it seems to occur on the LAN network stack as well.

Last but not least; a friend of mine owns also an Prusa MK4 and uses a slower, Kingston, USB Drive (will edit later with the exact model number). He compared about 40 of his gcode files uploaded through LAN and WiFi to this Kingston USB Drive with the source files on his computer and has 0 corrupted files.

rchiechi commented 1 year ago

Last but not least; a friend of mine owns also an Prusa MK4 and uses a slower, Kingston, USB Drive (will edit later with the exact model number). He compared about 40 of his gcode files uploaded through LAN and WiFi to this Kingston USB Drive with the source files on his computer and has 0 corrupted files.

I did something similar by automating the upload of a few files dozens of times for several hours. Over dozens of uploads of the same selection of gcode files I got zero errors. But with specific gcode files I got the error on every upload... until I reformatted the USB drive.

In my hands, two things seem to be true: 1) the bug isn't triggered until the USB drive has been in use for a while (i.e., had files written to it / printed) and 2) once a file triggers the bug, it can (but not always) do so reliably. But it really hard to pin down the variables that affect the bug without having a dedicated printer with several different USB drives, etc. to test.

vorner commented 1 year ago

Reading through the bunch of new posts and especially the interesting findings about reformatting the flash drives. USB drives use relocation tables (tables mapping virtual block numbers to physical ones, to balance the load of the memory and to stop using broken blocks). Formatting uses a TRIM command which effectively throws the current table away and starts a fresh (empty) one. Which leads to several things:

It also seems that the bug was „always present“ (cases of corruption on 4.7.1 reported above), but maybe because some timing issues it is now much more probable to happen.

Yes, we use third-party code for FAT & USB, but maybe it's time to start investigating these too.

Side not for the issue of cold nozzle and stuck printer. It very much sounds like the printer is desperately retrying reading data from the USB drive and not getting it.

Cbutters2000 commented 1 year ago

BADGCODE2.zip I'm also experiencing this again, with a SAMSUNG 64GB USB 3.0 drive Using PrusaSlicer, exported gcode to the PC, then copied over PRUSALINK to the Printer. Had a bad print. Compared the two SHA1 of the files on the PC and the one on the USB and they are different.

michalmiddleton commented 1 year ago

I also have this issue - I use Samsung USB drive (SAMSUNG MUF-64AB/AM FIT Plus 64GB). The bug exists for me on 4.7.1 and 4.7.2. (I haven't tried any other firmware releases). Formatted to FAT32 with MBR on Mac. Printing via PrusaLink & wifi.

DRracer commented 1 year ago

Dear reporters, our dev team spent a considerable amount of time trying to reproduce similar issues like you are reporting. So far we haven't been able to reproduce blocks of G-code randomly copied to various places, but in one case we have seen a block missing.

Currently, we ran out out of ideas. We suspect some data-race or timing issue somewhere around the filesystem/USB level, probably related to MK4+USB drive combination (we don't know that yet).

Therefore, we'd like to ask for your help.

  1. Is there anyone who is currently repeatedly and reproducibly experiencing the issue (every second or third corrupted G-code is reproducible enough) and willing to share their machine + the unformatted USB flash drive? Please report either directly to this thread or to my github email. We offer a replacement MK4 + another USB drive + something more in reward.

  2. Also, if there is anyone willing to share a dd image of their USB flash partition, that may help us as well. We don't know if it's just the filesystem level or if some USB communication timing is required to trigger the issue.

Thank you, good luck :crossed_fingers:

antimix commented 1 year ago

Hi Developers, I got another idea, but you have to check how it applies to the reality since I don't know too many details of the HW.

FACTS:

SUGGESTIONS: The file system logic is handled by the main Buddy board that communicate with the LCD (that contains the USB dongle) and so it sends the data back and forth to the LCD board through the ribbon cable.

Regards

hlipka commented 1 year ago

@DRracer It might be helpful to post that on the Prusa forums, and maybe even on Reddit - not everyone is following along here. From what I see so far the only common thing is the usage of USB drives with 64GB or more (AFAICS everyone affected reported that they are using such a large drive). So it might be that writing to higher block numbers causes that problem (when the drive is getting fuller).

Cbutters2000 commented 1 year ago

Dear reporters, our dev team spent a considerable amount of time trying to reproduce similar issues like you are reporting. So far we haven't been able to reproduce blocks of G-code randomly copied to various places, but in one case we have seen a block missing.

Currently, we ran out out of ideas. We suspect some data-race or timing issue somewhere around the filesystem/USB level, probably related to MK4+USB drive combination (we don't know that yet).

Therefore, we'd like to ask for your help.

  1. Is there anyone who is currently repeatedly and reproducibly experiencing the issue (every second or third corrupted G-code is reproducible enough) and willing to share their machine + the unformatted USB flash drive? Please report either directly to this thread or to my github email. We offer a replacement MK4 + another USB drive + something more in reward.
  2. Also, if there is anyone willing to share a dd image of their USB flash partition, that may help us as well. We don't know if it's just the filesystem level or if some USB communication timing is required to trigger the issue.

Thank you, good luck 🤞

My machine is doing this regularly. I would be willing to volunteer my machine+USB Stick for your testing. Please message me and I'll also email you at your email.

sjors-lemniscap commented 1 year ago

Dear reporters, our dev team spent a considerable amount of time trying to reproduce similar issues like you are reporting. So far we haven't been able to reproduce blocks of G-code randomly copied to various places, but in one case we have seen a block missing.

Not the hero we deserve, but the hero we need. Thank you!!

antimix commented 1 year ago

As example I am using a FlashAir (32GB) SD on a USB Dongle Card Reader, and no issue. So the space is 32GB and the card reader can "slow down" the data rate.

however I am between:

1) 40% - FAT32 library bug for big disk sizes 2) 40% - Data signal error on ribbon cable between the Buddy and the LCD board 3) 20% - Protocol error for FAST data rate USB dongles.

the fact that many users with constant errors on a file, solved the issue reformatting the USB or changing to a smaller size make me think more for option 1. And if it is option 1) it would be enough to send the USB and not the whole printer.

DRracer commented 1 year ago

@antimix your mind is heading the similar way like ours, what you wrote are valid comments and recommendation :wink: . We are working on it real hard, trust me, it's a serious issue which can cause all kinds of unexplainable behavior. We also suspect some kind of HW+SW related corruption.

However, since we haven't been able to reproduce the issue in any similar way, the next step is to collect a few customer's printes and investigate them.

DRracer commented 1 year ago

@Cbutters2000 Thank you for volunteering your machine for the test and your email. Our tech support team should reach to you shortly to organize the exchange.

abjugard commented 1 year ago

@DRracer It might be helpful to post that on the Prusa forums, and maybe even on Reddit - not everyone is following along here. From what I see so far the only common thing is the usage of USB drives with 64GB or more (AFAICS everyone affected reported that they are using such a large drive). So it might be that writing to higher block numbers causes that problem (when the drive is getting fuller).

Honestly it isn't all that surprising that people are replacing the USB drive, since the drive that ships with the kit is super slow, people would rightfully be extrapolating that slow transfers might be because of a slow drive, that's at least the sole reason I've installed a faster and larger drive (128Gb version of this one) in my printer.

If the problem stems in filesystems formatted in a way the printer doesn't like, why not add a feature to allow the printer to do the formatting?

I'd much rather the printer got better at working with filesystems created elsewhere (including exFAT please, or just anything that supports modern metadata), but this would be a good stopgap solution while we find the real problem.

@DRracer also please remove the "unable to reproduce" tag, it's a little insulting considering how many users have been able to reproduce it.

wizbongre commented 1 year ago

Folks, I appear to have resolved this issue with my machine. I performed a hard factory reset and shifted back from firmware Alpha to the latest RC. In doing so, my machine performed an update to the WiFi module as it reset. Since doing this, I'm not experiencing any more issues with prints via Connect or Link. (Really hoping I'm not speaking too soon but I've managed about 10 different prints with zero issues - full story here.

DRracer commented 1 year ago

@abjugard actually the slow USB read/write is caused by something else than a flash drive by itself, we are working on a solution to boost the transfer speeds.

Formatting a flash drive in the printer doesn't make too much sense, it uses standard FAT32 - it either works on doesn't. What we are trying to find out in this issue is how it is possible, that the data get corrupted.

And yes, the "unable to reproduce" tag is still perfectly valid on this ticket, because we really haven't been able to reproduce it, even though there are people experiencing the issue (and we are listening to them therefore machines are being collected to reproduce the issue).

@wizbongre That sounds suspicious, but - the 5.0.0-RC had some issues with upload speed, but keep your :eye: on it - if you cannot reproduce the issue with this specific version, please report back. It would be an important piece into the puzzle.

wizbongre commented 1 year ago

@DRracer to be 100% clear - I originally experienced the issues on 5.0 RC, so rolled back to 5.0 Alpha. I still had issues. Now that I've updated to 5.0 RC with a forced reset, the problem appears to have disappeared. I can't find a way to check the version of the WiFi module - is that possible, so maybe others on here could compare and see if there is any link?

vorner commented 1 year ago

I'd guess the wifi isn't the cause of the problem. For one, the ESP firmware is tied to specific version by the printer (will upgrade or downgrade as necessary, and it is checked on every boot).

For another, there are people reporting the problem on ethernet, with no wifi involved.

Furthermore, looking at the corruptions posted here, the „copy-pasted“ chunk of gcode is sometimes as far as some ~36kB (and always multiple of 4096 bytes) away and we are pretty confident we don't have a buffer this huge anywhere near the network stack. Both the 4096 and 64B mentioned above (which are offsets of corruptions from beginning, and lengths of the corruptions) suggest it is somewhere near the USB rather than network.

The bug seems to be very fragile in the sense some (unknown) conditions must be met for it to exhibit ‒ that is, wifi might be triggering it for some (maybe due to different timing). Unfortunately, that means that the bug seemingly „disappearing“ is more likely just disturbing the trigger than actually „fixing“ it :-(.

michalmiddleton commented 1 year ago

I'm one of the affected MK4 owners as well. I either don't print enough or I don't have a way to reliably reproduce the issue, however the issue seems to have disappeared after I switched away from my own USB flash drive (that I bought specifically for this printer) and went back to the ADATA drive that came with the printer.

One thing I noticed - the upload to the printer is now very slow. 5MB upload takes a while. The drive itself is super slow in my computer as well (Super slow for 2023 and compared to new modern drives). IMO this all seems to point toward the theory around write buffers.

I can offer my USB drive (SAMSUNG MUF-64AB/AM FIT Plus 64GB) to Prusa.

jstm88 commented 1 year ago

I also use the Samsung MUF-64AB/AM FIT Plus 64GB.

It makes sense that this specific drive seems to be very popular. The included drive sticks out quite far, which I don't like because it could easily get bumped and damage the USB port. This specific Samsung drive is one of the shortest models available from a name brand, so it's far less likely to be damaged. And the 64GB model specifically is the smallest (thus cheapest) variant available on Amazon ($10.99 with Prime one-day).

I'd argue Prusa should consider including something like this drive (a low-profile drive) in the box since it doesn't stick out as far. Regardless, these drives are very easy to source and it's certainly one that Prusa should be testing with.

antimix commented 1 year ago

I was looking around on the code (I hope I was looking at the correct part of the repository....) and I noticed that the allocation unit is not totally implemented.

the code handles only data storage size of 4K otherwise it assumes 64K

In the USB dongles the allocation is usually based on their size:

512MB-1023MB - 4KB 1024MB-2047MB - 4KB 2G-8GB - 4KB 8GB-16GB - 8KB 16GB-32GB - 16KB For sd/usb above 32GB, such as 64GB and above, exfat is applied - 64kb

So the code is missing the correct handling for 8/18GB dongles that use 8K blocks, and 16/32GB dongles that use 16K blocks.

This would explain why specific dongles that may be formatted in those odd allocation unit may cause mess on the allocation of sectors.

Here is a fragment of code:

   DATA STORAGE AREA (first 4K or 64k) 

  #ifdef DATA_STORAGE_SIZE_64K
    constexpr uint32_t data_storage_area_size = 64 * 1024; // Large erase unit
  #else
    constexpr uint32_t data_storage_area_size =  4 * 1024; // Small erase unit
  #endif

  /* In order to provide some degree of wear leveling, each data write to the
    SPI Flash chip is appended to data that was already written before, until
    the data storage area is completely filled. New data is written preceeded
    with a 32-bit delimiter 'LULZ', so that we can distinguish written and
    unwritten data:

           ''LULZ''         <--- 1st record delimiter
           <data_byte>
           <data_byte>
           <data_byte> 

At this point the code decides for wrong, if it is not 4K then it should be 64K... Unfortunately not always ;)

I may be completely wrong, since I did not had time to look further to the code, but who has written it may be in a position to better judge and have a better idea.

sjors-lemniscap commented 1 year ago

ifdef DATA_STORAGE_SIZE_64K

constexpr uint32_t data_storage_area_size = 64 * 1024; // Large erase unit

else

constexpr uint32_t data_storage_area_size =  4 * 1024; // Small erase unit

endif

Not sure if I understand correctly.. from your code snippet it actually accounts already for larger and smaller USB drives and the corresponding data storage size respectively:

  #ifdef DATA_STORAGE_SIZE_64K
    constexpr uint32_t data_storage_area_size = 64 * 1024; // Large erase unit
  #else
    constexpr uint32_t data_storage_area_size =  4 * 1024; // Small erase unit
  #endif
sjors-lemniscap commented 1 year ago

Experiencing the same issue here. After hours of troubleshooting why one layer of a particular print kept failing I figured that my gcode, uploaded through WiFi, was corrupt. My setup:

* Prusa MK4

* Firmware 5.0.0-RC

* Samsung Fit 256GB ‎MUF-256AB/APC

* USB formatted as FAT32 on MacOS Ventura 13.4.1 (c) with Master Boot Record partition scheme

* Printer connected over WiFi, Access Point is on the other side of the wall (facing backwards) of where the printer is positioned

The interesting part is that one particular file (attached the gcode and stl for reference) sheet-holder-hex.zip kept getting corrupted, no matter what I tried (renaming the file, generating the gcode again etc.). I decided to reformat my Samsung USB drive and after the first try it validated and processed the file successfully.

I have had exactly this experience and am using identical hardware/firmware, but with printer on Ethernet. I even had a gcode that reliably triggered the bug until reformatting the USB drive!

Interesting.. long story short I feel like (and as mentioned before by @jstm88) this is rather an issue of how the Prusa buddy board of the mk4 reads from the buffer and writes it to the USB, where it looks like that a fast USB drive is causing a race condition... Even though it still wouldn't explain why after formatting the exact same USB drive the issue didn't occur (so far). I'm curious whether the issue will slowly start to re-occur again the more I write to the USB drive.

We have at this point pretty much ruled out that the issue is specific to the WiFi network stack as it seems to occur on the LAN network stack as well.

Last but not least; a friend of mine owns also an Prusa MK4 and uses a slower, Kingston, USB Drive (will edit later with the exact model number). He compared about 40 of his gcode files uploaded through LAN and WiFi to this Kingston USB Drive with the source files on his computer and has 0 corrupted files.

The Kingston USB drive is a Kingston DTMC3G2/256GB. Does about 200MB/s read and 60MB/s write according to the tech specs (take this with a grain of salt tho). I decided to purchase this USB drive and have been using it now for over a week. With about 30 gcode files transferred and verified (through the Python script from @jstm88) I can confirm that I didn't had a single corrupted file yet, whereas with the Samsung drive this occurred pretty fast already (after a couple of uploaded files). I will continue testing with the Kingston drive as I'm printing frequently.

@Prusa-Support it might be worth purchasing the exact same Samsung USB drive that many of us seems to be using (Samsung Fit 256GB ‎MUF-256AB/APC) as it looks to me that something might be wrong with with this drive specifically, my assumption is with its transfer speed (it is truly a fast USB drive) or storage area size allocation.

antimix commented 1 year ago

ifdef DATA_STORAGE_SIZE_64K

constexpr uint32_t data_storage_area_size = 64 * 1024; // Large erase unit

else

constexpr uint32_t data_storage_area_size = 4 * 1024; // Small erase unit

endif

Not sure if I understand correctly.. from your code snippet it actually accounts already for larger and smaller USB drives and the corresponding data storage size respectively:

  #ifdef DATA_STORAGE_SIZE_64K
    constexpr uint32_t data_storage_area_size = 64 * 1024; // Large erase unit
  #else
    constexpr uint32_t data_storage_area_size =  4 * 1024; // Small erase unit
  #endif

Yes, it already accounts for larger and smaller USB FAT formats, but only for clusters of 4K or 64K. It ignores that they exist also other cluster sizes, 8K and 16K, and if you plug in a USB formatted with 16K clusters or 8K clusters, it handles like it were 64K, messing up completely the file system.

sjors-lemniscap commented 1 year ago

ifdef DATA_STORAGE_SIZE_64K

constexpr uint32_t data_storage_area_size = 64 * 1024; // Large erase unit

else

constexpr uint32_t data_storage_area_size = 4 * 1024; // Small erase unit

endif

Not sure if I understand correctly.. from your code snippet it actually accounts already for larger and smaller USB drives and the corresponding data storage size respectively:

  #ifdef DATA_STORAGE_SIZE_64K
    constexpr uint32_t data_storage_area_size = 64 * 1024; // Large erase unit
  #else
    constexpr uint32_t data_storage_area_size =  4 * 1024; // Small erase unit
  #endif

Yes, it already accounts for larger and smaller USB FAT formats, but only for clusters of 4K or 64K. It ignores that they exist also other cluster sizes, 8K and 16K, and if you plug in a USB formatted with 16K clusters or 8K clusters, it handles like it were 64K, messing up completely the file system.

This still wouldn't explain why my Samsung Fit 256GB ‎MUF-256AB/APC is having issues and the Kingston DTMC3G2/256GB not (both 256GB, meaning according to your explanation 64kb). I formatted them both the same way on MacOS. Below a dump of diskutil info: Samsung Fit 256GB ‎MUF-256AB/APC

   Device Identifier:         disk4s1
   Device Node:               /dev/disk4s1
   Whole:                     No
   Part of Whole:             disk4

   Volume Name:               PRUSAMK4
   Mounted:                   Yes
   Mount Point:               /Volumes/PRUSAMK4

   Partition Type:            DOS_FAT_32
   File System Personality:   MS-DOS FAT32
   Type (Bundle):             msdos
   Name (User Visible):       MS-DOS (FAT32)

   OS Can Be Installed:       No
   Media Type:                Generic
   Protocol:                  USB
   SMART Status:              Not Supported
   Volume UUID:               082F637C-4725-3F71-8404-A6D759A65319
   Partition Offset:          1048576 Bytes (2048 512-Byte-Device-Blocks)

   Disk Size:                 256.6 GB (256640024576 Bytes) (exactly 501250048 512-Byte-Units)
   Device Block Size:         512 Bytes

   Volume Total Space:        256.6 GB (256577339392 Bytes) (exactly 501127616 512-Byte-Units)
   Volume Used Space:         26.6 MB (26574848 Bytes) (exactly 51904 512-Byte-Units) (0.0%)
   Volume Free Space:         256.6 GB (256550764544 Bytes) (exactly 501075712 512-Byte-Units) (100.0%)
   Allocation Block Size:     32768 Bytes

   Media OS Use Only:         No
   Media Read-Only:           No
   Volume Read-Only:          No

   Device Location:           External
   Removable Media:           Removable
   Media Removal:             Software-Activated

   Solid State:               Info not available

I'm currently printing with the Kingston DTMC3G2/256GB, so will post a diskutil info dump of the fat32 partition on that drive tomorrow.

antimix commented 1 year ago

You give the answer ....

Volume Free Space: 2 56.6 GB (256550764544 Bytes) (exactly 501075712 512-Byte-Units) (100.0%) Allocation Block Size: 32768 Bytes (32K)

You have 32K allocation, and the FW handles as it were 64K messing everything. ;)

sjors-lemniscap commented 1 year ago

You give the answer ....

Volume Free Space: 2 56.6 GB (256550764544 Bytes) (exactly 501075712 512-Byte-Units) (100.0%) Allocation Block Size: 32768 Bytes (32K)

You have 32K allocation, and the FW handles as it were 64K messing everything. ;)

It still doesn't explain why my Kingston DTMC3G2/256GB, with the exact allocation block size, doesn't have any issues so far. Below the diskutil info of that drive:

   Device Identifier:         disk4s1
   Device Node:               /dev/disk4s1
   Whole:                     No
   Part of Whole:             disk4

   Volume Name:               PRUSAMK4
   Mounted:                   Yes
   Mount Point:               /Volumes/PRUSAMK4

   Partition Type:            DOS_FAT_32
   File System Personality:   MS-DOS FAT32
   Type (Bundle):             msdos
   Name (User Visible):       MS-DOS (FAT32)

   OS Can Be Installed:       No
   Media Type:                Generic
   Protocol:                  USB
   SMART Status:              Not Supported
   Volume UUID:               EB70A6BA-BA35-3EDC-BBCC-C73BF5135942
   Partition Offset:          1048576 Bytes (2048 512-Byte-Device-Blocks)

   Disk Size:                 248.0 GB (248033312768 Bytes) (exactly 484440064 512-Byte-Units)
   Device Block Size:         512 Bytes

   Volume Total Space:        248.0 GB (247972724736 Bytes) (exactly 484321728 512-Byte-Units)
   Volume Used Space:         60.9 MB (60882944 Bytes) (exactly 118912 512-Byte-Units) (0.0%)
   Volume Free Space:         247.9 GB (247911841792 Bytes) (exactly 484202816 512-Byte-Units) (100.0%)
   Allocation Block Size:     32768 Bytes

   Media OS Use Only:         No
   Media Read-Only:           No
   Volume Read-Only:          No

   Device Location:           External
   Removable Media:           Removable
   Media Removal:             Software-Activated

   Solid State:               Info not available
danopernis commented 1 year ago

I would like to point out that the code snippet @antimix is posting is located in file /lib/Marlin/Marlin/src/lcd/extensible_ui/lib/lulzbot/archim2-flash/flash_storage.cpp which is not part of the MK4 firmware and does not deal with USB flash drive. That being said, it is still interesting that there is a particular USB drive model which does not work with the printer.

rchiechi commented 11 months ago

Since updating to the 5.0.0 release, most if not all uploads are corrupted. Upload speeds are also faster. I'm curious if anyone else is experiencing the same thing.

codingcatgirl commented 10 months ago

This is also happening to me. I would appreciate the file could be verified using a checksum so the upload fails when this occurs. Woulud save us failed prints and probably also make it easier to reproduce / check if it has been reproduced.

Oh, also it failed on the same project twice, but in different ways (same layer though, i think). Maybe this helps with reproducing it. I don't want to post the project files publicly online, is there some other way to supply them?

jrgiacone commented 10 months ago

New test using newest Prusa Connect, 5.1 stable and newest prusa slicer. My Gcode corruption happened again when uploading file to Prusa Connect and adding the file to the queue, the uploaded file is different.

Upon further investigation I actually found prusa slicer was the offender. Not sure why, but all 3 files matched (the bad one) prusa slicer original export, usb file, and connect file. When reslicing the file it was correct and now working. I am currently using the app image on fedora linux.

bcode compare.zip

I have attached the comparison of the files, even the file sizes are different. So either this is a new bug or the same, but the issue now seems to be present in prusa slicer creating different file outputs with the same settings... Hope this helps

stuz32 commented 8 months ago

Hey team,

Confirming I have encountered this issue by transferring gcode via Prusa Connect.

Slicer version: Build: PrusaSlicer-2.7.1+win64-202312121425 Prusa Printer: MK4 Prusa Printer Firmware: 5.0.1+12089 (Note; I'm running an older version intentionally to avoid another bug I've raised)

Comparing the gcode transferred via Prusa Connect and physically via USB, I can confirm the files are different and the cause of the issue.

I'll attach a zip file with the two gcode files so you can compare the pair. I'll also include the Prusa slicer project file for review to see if there are any clues.

Regards, Stu

3156-gcode-transfer.zip

legoman666 commented 8 months ago

I am also having the same issue. PrusaSlicer: Version: 2.7.1+MacOS-arm64 Build: PrusaSlicer-2.7.1+MacOS-arm64-202312121425

Laptop: Operating System: Macintosh System Architecture: 64 bit System Version: macOS Version 14.2.1 (Build 23C71) Total RAM size [MB]: 34,360MB OpenGL installation GL version: 4.1.0 Profile: Core Vendor: Apple Renderer: Apple M1 Max GLSL version: 4.10.0 Connected via WiFi

Printer: Mk3 -> Mk3s -> Mk3s+ -> Mk4 Firmware: 5.1.2 Connected via ethernet, not wifi Using the flash drive that came with the printer

The resulting layer shift is 100% repeatable every time I try to print the same bgcode. Tried uploading via prusalink and also tried dragging the bgcode file directly to the printer. Haven't tried copying the file via usb yet. Some files upload ok, others layer shift in the exact same spot every time.

codingcatgirl commented 8 months ago

@legoman666 I would recommend to check if there's actually file corruption happening in that version, the reproducible layer shift could also be caused by some other bug.

antimix commented 8 months ago

@legoman666 please do the following tests:

1) Export the gcode on the PC (in binary format .bgcode) and physically copy on the USB key, and check if it prints well. 2) Export the gcode on the PC (in uncompressed format .gcode) and physically copy on the USB key, and check if it prints well. 3) Send the G-code to the printer in uncompressed format, and check if it prints well.

HOW TO EXPORT IN PLAIN UNCOMPRESSED GCODE: Go in Printer settings and UNCHECK the Supports Binary G-code checkbox and the click to the export button.

In this way we will check if the issue is on the file transmission or the file structure itself.

legoman666 commented 8 months ago

I think I found the cause of my issues, weird that it was so repeatable. Anyway, not related to the bgcode getting corrupted. Delete my posts if it makes the thread cleaner.

Prusa-Support commented 8 months ago

@legoman666 That's OK 🙂 pretty much all comments can be equally important and potentially useful for other users coming across it with similar problems, and for us to collect feedback and ideas. Only please consider editing the message, adding ~~ at the beginning and at the end of the text to strike it through. Maybe add a [edit] section to clarify what was the problem/solution for you.

We will keep on monitoring the the conversation and collect every bit of feedback which may help us deliver a better user experience.

Michele Moramarco Prusa Research

michalmiddleton commented 8 months ago

Hello, I'm one of the people who have/had this issue. I switched back to the USB drive that came with my MK4 (A-DATA) and have not had a single problem since then. It's hard for me to guess the time but it was right around when firmware 5.0 (stable) was released. Today, I figured I'd give it another shot with my Samsung 64GB drive but a crash took place.

This is how it went down:

  1. Printer was off. I removed the ADATA drive, put 5.1.3 firmware file on it and powered on. FW update went well.
  2. I removed the ADATA drive and inserted a freshly formatted (on Mac, FAT32) drive in. Printer continued to show No USB.
  3. I removed the drive and plugged it back in. Printer started beeping and generated BSOD crash.
  4. I left the drive in and powered printer off and then back on. Printer booted normally and saved the crash dump.

@Prusa-Support please let me know if you are interested in the crash dump file, I can provide it. I just didn't want to post it here since it contains my WIFI password in plaintext. IMG_4420

Prusa-Support commented 6 months ago

Please only share dump files via email at reports@prusa3d.com. Where it makes sense, you would be able to include useful additional information and files like:

Michele Moramarco Prusa Research

github-actions[bot] commented 4 months ago

This issue has been flagged as stale because it has been open for 60 days with no activity. The issue will be closed in 7 days unless someone removes the "stale" label or adds a comment.

jstm88 commented 4 months ago

Noticed this hasn't gotten any attention, even though the issue still exists in 6.0.x firmware and has not been addressed.

The problem of Gcode (and bgcode) files corrupting during upload with no way for the printer to detect the issue is a pretty major one, since it has the potential to actually cause damage to the printer.

There should be a multi-layered approach here:

  1. Fix the firmware issue causing this.
  2. Implement a function to upload and verify (my gist from earlier does this manually, although it's a bit slow). This functionality should ideally be in PrusaSlicer so it can upload and verify.
  3. Implement a way to embed a checksum/hash in bgcode files that the printer can verify the data is not corrupt while being read. I have not looked at the bgcode format but I hope this could be done. Creating a new format without a checksum is a massive mistake and I hope the format allows for it.

The third option is actually important not just for upload corruption but to deal with possible flaky drives. The main challenge here is that there are two places corruption could occur. Detecting corruption by reading/hashing the entire file would be simple. But the entire file is not loaded into memory at once. Ideally we should be able to checksum each block of the file as it is being read. With limited resources (I don't know how many spare cycles we have to work with here) there are ways to do it, but it would require some thought. Doing this properly might require some additional metadata to be added to the gcode file (a very acceptable and reasonable thing).

One simple "resource-friendly" way would be to have a rolling XOR, where every so often (maybe based on number of lines) the slicer injects a specific Gcode with the current XOR of all the preceding bytes. As the printer reads the data, it checks against that value and stops the print if it finds a mismatch. The calculations are simple, it can be done in realtime as the file is read, and unlike a "normal" full file checksum, it will catch errors in the actual data being fed to the printer. A full file checksum might be valid and then the drive could have a read error during the print, but a "continuous" check like this would detect those kinds of errors. A simple XOR might not be the best option here, I'm sure there are other stream-based hashes that are extremely fast. The concept would be the same. The XOR method would have a 1/256 chance of not catching any given corruption, which is not bad but I'm sure we could do better.

danopernis commented 4 months ago

This indeed should not be marked as stale. I agree with the proposition that the primary fix should be in firmware, it is just not easy to catch it due to its random nature. Adding to difficulties, attempts to increase observability of the relevant code mess up with the timing, so the bug does not manifest.

There is already CRC check in the new bgcode file format, but it is not yet being checked on the firmware side. This is currently being worked on, but I can't give you an estimate as to when it will be enabled.

trainmeditations commented 3 months ago

I think I've recently run into this problem myself after trying out Prusa Connect for the first time. Prints I sent through connect were randomly skipping sections of the print. I did a binary compare of the separate bgcode files from one I generated locally and one I copied off the USB drive after being delivered by Prusa Connect and there were differences but it was early in the file and it might have just been metadata differences, I didn't look at the actual content. I will try it again and compare both gcode files and bgcode files to see if I can catch the differences. Weirdly the broken bgcode file that was skipping sections, when I loaded into the gcode viewer, didn't seem to show those sections being skipped, so I'm not sure what was happening there.

antimix commented 2 months ago

You can find a lot of samples from users that have a similar issue that should have the same root in this bug #3156 [BUG] Missing outer perimeter. #3650 Forget the title that is misleading. That should be caused by corrupted .bgcode. You will find other useful info also here in the PRUSA forum: Prusa MK4 missing lines randomly

antimix commented 2 months ago

MODEL: MK3.5 FW: 6.0.4 Phisical printer: Defined as PrusaConnect connection.

Since the last two firmware releases there is a new behaviour:

Note that PrusaSlicer does not intercept the error situation, and display OK. Today I had to send the gcode 4 times before I got the correct transfer. Also PrusaConnect is completely unaware that this error situation happened and there is no mention of that in any panel. Till now I did not get this error yet on the MK4 (same FW 6.0.4).

antimix commented 1 month ago

I got this error also on the MK4. So the issue is present on the two flavour of the Firmware (MK4 / MK3.5). However I noticed that if I go on PRINT from the LCD the transmitted file is there, and I could print the object without errors, so the file should be arrived intact from PrusaConnect. May be the issue is a synch/lock issue between the LCD image display routine and the PrusaConnect link. May be this is another issue not related to this #3156. However I have ordered the MK4S kit, and I know the wireless board is new. Let's hope all this issue will be gone!

Prusa-Support commented 1 week ago

The new firmware 6.2.0-alpha1 includes overhauled USB media prefetch feature. This should reduce the chances of connectivity issues during printing and possibly improve this issue. With a reminder that this is potentially a not fully stable release, we would appreciate help with testing.

Michele Moramarco Prusa Research

Koder22 commented 1 week ago

Benchy

I still get this issue with 6.1.3 firmware on my MK4, almost every time i use Prusa Connect. If i use the same USB stick in my computer and manually transfer the gcode the print is always perfect. I had the same problem with a Prusa Mini a year ago. Sometimes a perimeter is missing, and sometimes there is what looks like a perimeter too much. Tried reaching out to Prusa Support, but problem wasnt known to them.

Did a comparison but couldn't find any differences between GCODE uploaded and made in Prusaslicer.

This is not the benchy hull line, the "hull line" is clearly visible below the missing/duplicate line. Also this is not wet filament, or broken USB Memory Stick.