Closed sp193 closed 6 years ago
Yesterday I started a topic discussing this issue with USB Flash Drives tending to get corrupted rather frequently after I upgraded from FMCB 1.94 to 1.96+
So I look forward in testing these prototypes.
Question? Should we start to advise users to not upgrade to FMCB v1.96* or the current JUNE updates of wLE?
Is this issue with FMCB? Or wLE or both combined? I guess that's my question???
I'm also thinking that the reason why FMCB 1.94 doesn't have these issues, because all the current PS2SDK updates don't affect that version, since it was built a while back with an older PS2SDK build.
Did FMCB v1.95* have these issues also? This way if we advise users to roll back, which of the two is the lesser of two evils. :-)
I have updated the first post to contain a link to my branch and to a prototype. This is not an actual fix for scache's shortcomings, but prevents the problem by reading the block before it is written (instead of just writing).
This problem is not with any specific piece of software, but USBHDFSD. FMCB bundles the USBD and USBHDFSD modules, but they are not actually embedded within FMCB. These modules are stored as: mc:/SYS-CONF/USBD.IRX mc:/SYS-CONF/USBHDFSD.IRX hdd0:/sysconf/FMCB/USBD.IRX hdd0:/sysconf/FMCB/USBHDFSD.IRX
They can be freely replaced. Only LaunchELF has a copy of these modules embedded, but you can get it to use external versions.
If this bug does actually exist, then there is no bug-free version of USBHDFSD because this bug was already there since day 1. It will affect any software that uses USBHDFSD.
It is possible that the glitch is getting more obvious because the cache is invalidated more often (due to the need to writeback & invalidate the cache before long writes). I realize that the scache_allocSector() function will return a buffer, if the block was accessed before. That would probably mitigate the bug because the rest of the block would not be invalidated. But if the block is not loaded in the cache, then scache has actually no idea if the other sectors in the block are not used or not...
Are we sure this issue is not related to the recent large-block read/write support? Or a combination of reading/writing large blocks and the scache?
I've been trying all sorts of things to get my usb stick corrupted but everything seems fine. I think in order to fix a problem we first must be able to reproduce it.
@Jay-Jay-OPL have you been able to reproduce this issue?
There's two other fat32 drivers in ps2sdk now that could possibly have the same issue: IEEE1394_disk and bdmfs_vfat. I don't think IEEE1394_disk is used anywhere, so should it be removed? For bdmfs_vfat I'm working on removing the scache, then introduce a new block device cache in bdm.
I believe it was always like that. Just that there are now more chances for the glitch to occur. It was an uncommon occurrence for me, but sometimes my disk would get corrupted. This has been happening for as far as I can remember. By the time the damage is done, the disk was processed by a number of computers and software, that it became difficult to lay blame on anything specific.
Despite the fixes made to it, I had no actual explanation for the occasional damage to the filesystem. I had blamed the lack of Unicode support, but the problems also do occur when there are only files with ASCII filenames.
I also had this really bad feeling about the rename function. For reasons, it seemed to occasionally not work properly, but appears to be logically sound and can actually pass in basic tests. If renaming can cause corruption, then it would have nothing to do with the recent changes.
One user in the FMCB thread on psx-scene tried a combination of adding new files, renaming and deleting, and the disk was corrupted. But because of that, it is still difficult to tell which exact action caused the problem.
I realized that if I once made the mistake of assuming the scache blocks can allocated and it did not work, then by what basis can the same thing be done here?
As you know, it is really difficult to replicate this glitch. I created this thread, since I got a new lead, but I still have no way to replicate the problem. So if it at least seems to be better, then I guess it is a step in the right direction.
We can remove the unneeded module anytime, if desired.
I have made this a part of the SDK, as well as part of FMCB v1.963 and LaunchELF. So that it becomes easier to know if it did help in any way. And if it does, then it would solve at least part of the problem. Even if it does not actually work in the way I think it does, the change should be harmless.
I hope to keep this thread open, so that we can continue discussing the existence glitch here. Or in the best case, we reach a conclusion soon.
I think it's still failing. See my most recent report here: Reply #9
FYI: the OPL ELF that I am referring to there is the one you provided for us to test for the ARP Table: https://github.com/ifcaro/Open-PS2-Loader/pull/101#issuecomment-402277656
In case it had something to do with the OHCI controller bug, I shortened the transfer length for writing to 4096 and noticed that the problem was getting worse. So I inspected the scache_flushSectors() function and found a missing line, which means that any cache writeback is not complete. So hopefully, this other issue will be fixed as of commit 9772287.
Thank you for bringing this up. I shall be pushing updates for FMCB, LaunchELF and PS2Ident soon.
I did a bunch some copying, renaming last night and did not encounter any errors. But good to see another bug squashed in cache writeback fix
Today I got some corruption, so I think it's not totally solved.
Using files from FMCB Update 2018/07/04: v0.983
@sp193, this update works good with FTP'ing from PC to Mass, I can't seem to make it fail anymore. :)
But I give a full report here in this thread about another situation that happened before updating that may be another issue that we can look into: Reply #11
It is great, that it seems to have died down. Even with 1-2 cases, it may not be the same problem anymore.
@gingerbeardman: What exactly did you do? There are currently many possible reasons for things to go wrong.
What I have fixed, was a cause of corruption which caused corruption nearly every time data is written. Regardless of whether it's a new file or not. The earlier patch (mentioned at the start of this thread) was for a potential case of corruption, which could happen while writing new files or renaming.
What tool did you use to check the disk? I noted that using fsck from Linux may give a warning about the free space summary being wrong, and that is likely true (USBHDFSD does not update such a thing). But I don't think it is an actual cause for concern because it is just a summary and the Microsoft chkdsk tool does not deem that as a problem (no mention of such a summary either).
@Jay-Jay-OPL: LaunchELF will not write anything, unless you tell it to. This is a very simple piece of software, in terms of functionality for the user anyway. You wrote before that you have two flash disks. Which one did this happen to? If you have been using only them, can you check if it ever happens to the other as well? Perhaps you can just start using it on a regular basis, just to see if it ever gives a different result.
If it really happened when you did not write anything, I think we cannot (yet) eliminate the possibility that there is a hardware incompatibility or some hardware problem (if the disk is old).
I noticed that some corruptions are more likely to re-appear, once the FS/Stick got corrupted (even if it got repaired via i.e. Chkdsk).
I think it is worth trying it with a 'fresh' and newly formatted drive!
I will check with a new drive, and record exact steps I am doing.
But generally they are:
I use
The problem here is that you used OPL. Unless you somehow got a new build that has the fixed driver, then it might corrupt the disk when you create the VMC files.
EDIT: Played PS games with OPL? You mean POPStarter right? Anyway, it means the same thing - unless you use the new driver, then the filesystem might have been corrupted there. Really old drivers also had an issue with creating a new directory that has no errors.
These builds with the buggy driver even corrupt the disc, if USB is just started and some of the OPL-Specific folders are missing (thus it tries to create those folders and during that process corrupts some files)! ;)
I had that happen to me recently, but fortunately it doesn't happen in the newest builds.
By 'playing PS1 game with OPL' he probably means, that he has POPStarter set up the 'ps2home.com'-way... So he essentially starts a PS1-Game from the OPL-Daily Builds GUI, which starts POPStarter, which then starts the chosen game. ;)
The problem here is that you used OPL. Unless you somehow got a new build that has the fixed driver, then it might corrupt the disk when you create the VMC files. ...
@sp193 ah! Yes. POPStarter through OPL.
@TnA-Plastic exactly, thanks
@Jay-Jay-OPL are you aware of this?
@sp193 I retired the old flash drive that I've used for years (PNY 32GB) -- meaning I don't use it anymore for PS2 Homebrew -- but that old sucker is still working fine loading other things on my other audio/video devices.
I then purchased two SanDisks (64GB / 32GB), I've been using the 64GB version for all the recent tests.
I plan to format it (SanDisk 64GB), to see if I can replicate that odd issue that just by browsing the folders within uLE could cause that folder/files to get corrupted. If not, I guess we can say we are out of the woods for now.
Still testing the FTP issue, and no problems, it is working fine. So at least that seems to be fixed (for now)...
I don't want to spam this thread but I have to say it... 'for now' indeed is the appropriate wording/description!
@Jay-Jay-OPL what I meant is are you aware that the USB drivers inside OPL need updating? Would you do that or would you wait for it to be done upstream?
@sp193 will OPL get the fixed drivers soon?
You should ask the people who still maintain the OPL project. They just need to compile with the latest PS2SDK revision.
Once this thread hits 2 weeks with no new findings, I shall close this thread.
In that case I will do some more testing.
@sp193, I haven't had much time testing more if USB HDD corruption has ceased to exist, since it's Summer and I usually have house guests -- one of the few disadvantages of living near paradise (i.e. a hot vacation spot).
But I wanted to also test a bug that was also reported a long time ago. You see, for the longest I stayed back with FMCB 1.94 -- but I recall that users that started to use FMCB 1.95 would report a bizarre issue where if they tried to copy back and forth a file that had no extension and was 0 KB in size, that it would cause either USB HDD corruptions or uLE would refuse to transfer the file.
Do you recall such bug being mentioned in the past? That is something I also would like to test also, but like I said, kind of difficult during summertime for me. Hopefully you or someone else can test that? If not, I will have to get back to you when I give that a good solid test.
Perhaps I was too eager to see this solved. We can keep this thread open longer, no problem!
From what I can see, the fs_read() and fs_write() functions prevent reading or writing of a 0-length file. But for now, please try not to do anything weird like that. Even if you do successfully trigger the glitch, you will contaminate the results for the tests that we have been talking about within this thread. None of the fixes that have been brought up, would have addressed such an issue
Would that prevention be a possible cause, if the GUI/backend of wLE doesn't realize it being not allowed?
I think it's just another issue, perhaps somewhere else. Unrelated to this thread. It may not even be within the SDK (i.e. if LaunchELF fails due to the 0-byte parameter). For starters, we have neither established what the symptoms are like nor what the conditions for triggering the bug are. If it can be replicated, please open a new issue ticket, under the relevant project.
So I have recently been chasing an issue which turned out to be a corrupt save game due to POPS.
And I realised it's POPS that creates the save folders and VMC files, not OPL.
So I think it's enough to put the IRX next to POPS as detailed in its wiki? Or maybe copying USBD.IRX and USBHDFSD.IRX to mc:/SYS-CONF?
Or does this means that (discontinued) POPS needs to be modified to include the fixed USB drivers?
@sp193, okay, I finally had time to test this some more... (no more house guests, yay!)
Please go here and read my latest update in Reply #13
@gingerbeardman Yes, its USB modules have to be updated as well. Thankfully, it has a function to use externally-supplied USB modules. You can refer to this post: http://www.psx-place.com/threads/popstarter-external-hdd-error.19326/#post-131008
@Jay-Jay-OPL Thanks for taking the time to check it out. I don't really know why you can destroy folders just by accessing them. It's just impossible, through normal means. What do you usually do before accessing the folder? Is there any chance that you last experienced an incomplete write, before reseting the PS2 and accessing this affected folder?
But what do you mean by "the file that was transferred was not 100% complete"? Was the copied file incomplete/different in any way (and in what way?), or did you mean that LaunchELF gets stuck while copying the file? If the file was incomplete/different, I would appreciate it if you could describe the corruption. Like at what offset it occurs at and how long the bad regions are, and whether the pattern repeats across the file.
There was one more mistake that I found, related to the workaround for unaligned writes. However, that is likely not applicable to LaunchELF, although the alignment of the buffers utilized by LaunchELF is insufficient to not require software correction. It's at least aligned to 16-byte addresses.
As for what we can try now: Today's build (control): https://www.sendspace.com/file/05j4ef With write transfers limited to 4KB (the original limitation of older drivers): https://www.sendspace.com/file/wbcp8g
If bottom file somehow magically solves the problem with files occasionally being incompletely written, then perhaps the hardware is just incompatible with at least some devices.
Interestingly, I've moved from a SanDisk Cruzer Fit USB 2.0 to SanDisk Ultra Flair USB 3.0 (mainly for faster writes from my computer) and have not seen any corruptions (though I also pre-generated all my game folders and blank VMC using a batch file)
I will also try the POPS USB drivers.
I've prepared some control files, filled with 0xFF
in KB sizes 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024. Feel free to use them to check file copy/writing functions.
FF.zip (now with MD5 checksums)
When I tried to copy (cut?) these control files on a USB flash drive, to another folder from their own folder, their own folder became corrupted after only the first two files had been copied. And certain other files on the disk were overwritten with FFs. So could this problem be a fragmentation issue?
I could recover the contents using chkdsk /F
.
Are you using the latest version, especially after the bug in cache-writeback was found (4th July)? It has to be at least that version.
The logic between reading and writing seems quite similar. How strange. The holes seem to be of some irregular lengths too (not a multiple of 512), Although they begin at some aligned address.
I would blame the cache, but the holes seem to be very strange in length.
Can you try the 2 files I provided above? In case either the new glitch that was fixed (with regards to unaligned accesses) was relevant or if there is some relationship with how we increased the maximum transfer length.
I'll try the other builds.
I'm using 4th July version.
I deleted my previous message about the holes in a moment of sleep-deprived madness. If you have the ability or the contents, please reinstate it. Apologies!
Just a thought, there is already the MD5 hash check code in ps2sdk: https://github.com/ps2dev/ps2sdk/blob/master/common/tcpip/lwip-2.0.0/src/netif/ppp/polarssl/md5.c
So is it possible to have a build that checks MD5 after copy? Or ability to calc MD5 on demand through a wLE context menu item? Would be very useful because I would not have to roundtrip to my computer to check validity of files. Thanks
But what do you mean by "the file that was transferred was not 100% complete"? Was the copied file incomplete/different in any way (and in what way?), or did you mean that LaunchELF gets stuck while copying the file?
What I mean is that when I copy or transfer the file from FTP to MASS or copy the file from MC to MASS, then ELF will not work, so when I then check the MD5 checksums between the original ELF and the copied ELF, their hashes don't match.
I get no warnings from my FTP client (Filezilla or FlashFXP) that the transfer was unsuccessful. And from the naked eye they both seem to have the same file size.
If the file was incomplete/different, I would appreciate it if you could describe the corruption. Like at what offset it occurs at and how long the bad regions are, and whether the pattern repeats across the file.
Here is a ZIP of the original elf and the copy elf. https://www.sendspace.com/file/cum5pc
You will see that the two hashes don't match:
Original ELF: C04B6661CF0B4408DB8AA45FD60656EA OPNPS2LD.ELF
COPIED ELF: 1C106EEBCA55A94B017AABB3DF0D1064 OPNPS2LD.ELF
I haven't yet tested the two builds you provided above. Since, I was busy reporting another issue with OPL.
So I wanted to share that during that issue, I decided to only work with uLE v4.42d (2013-03-24) when I was transferring the ELF from FTP to MASS.
Since the current wLE v4.43a (2018-07-04 / commit: 59a4962) was a pain in the ass -- because it would constantly fail to transfer the ELF file 100%. Like I said before, it a constant hit and miss.
And I just want to let you know that with uLE v4.42, I didn't have a single issue with transferring the ELF file via FTP a few dozen times. It was 100% good!
NOTE: I am still using FMCB v1.963. -- I have not tried testing FMCB v1.964 -- since I think those changes done to that version were for mostly old PHAT consoles, right?
Hopefully this helps. I will try to make wLE v4.43a try to fail some more, to give you more examples for you to study. Perhaps there is a pattern?
Could all this issue be only all the current changes fixes you've done to wLE? Since I am using one of the latest FMCB versions with it? I figure the issue could be just wLE?
You must understand that the part that deals with the USB device is the USB Mass Storage device driver, USBHDFSD. Not the app you run. But since each PS2 app has control over the PS2, each one will likely have their own modules. And so the only way to change USBHDFSD is to change the app.
The version of FMCB you use, hence does not actually matter. Other than the fact that other software may use its modules (but there is no formal arrangement for this) and FMCB bundles some version of LaunchELF (which you can also replace) that carries its own USB modules. This is why I have been handing out versions of LaunchELF for tests, but not FMCB.
The old modules that come with old versions of LaunchELF are not faultless either, but you likely just never encountered their problems. For example, the mistake I mentioned at the beginning of this thread was there since day 1 and I have probably seen it corrupt my disks before (although it was quite uncommon). They're also incapable of creating directories properly and have less error-handling for USB device stalls (hence why our PS2s used to hang more often at boot). These are some reasons why we must move on.
Now we're trying to lift the transfer length limitation of USBHDFSD, which was there since the dawn of time. This is the part that seems problematic. It seems to read properly (you cannot boot ELFs or read JPEGs properly without being able to read properly) and the code for writing is similar, so the logic for writing should be fine. You also seem to always get corruption (even when it's supposed to be impossible), so I cannot rule out a hardware compatibility problem.
For v1.964, I also updated the USB modules and LaunchELF. Since a mistake was found, I decided to ship out the patch as soon as possible. There were no real changes between v1.963 and v1.964, other than external files (USB modules, HDDLOAD and LaunchELF) and the version number (hence the minor increment).
From the file you provided, it looks like a single, 4KB region (equivalent to 1 cache block) of corruption that is on a boundary that is nicely aligned for the cache system (+0x108000). Within the region has an ELF header. It might be related to the cache, but I haven't seen anything unusual though.
On the other hand, it's also perfectly 4KB of wrong data, which is strange.
Anyway, thanks guys, for sitting through this withme. It's because of things like this, that I don't want to continue working on PS2 stuff.
EDIT: By the way, what is your cluster size? It's not 4KB, is it?
When I formatted the flash drive, I allowed the FAT32 app to auto select the cluster size.
So it was: 32768 bytes (32K)
I guess it selected that cluster size, since the flash drive is under 64GB in size.
mine is also 32kb.
I know is not fun to fix bugs created by other programmers so thanks for all your work looking into this
I am back to testing this.
My method is to use HashCheck to generate a checksum file containing MD5 of all items on the USB drive. Then do some file manipulation and then open the checksum file in HashCheck which will then show me any modified files. I can then see any files modified outside of my control, and then I will inspect the disk to see if they were adjacent in storage.
OK, I have reformatted my USB drive and loaded on a fresh set of files, calculated checksums, and I am now ready to cause some destruction! Hahaha.
If anybody uses this special version of LaunchELF with PLANETTY, it should become possible to tell what is being written and how the device responds. However, this doesn't mean we'll surely find something, unfortunately.
I have not tested this. But I know the PC clients work.
To anyone who participates in this experiment, thank you.
@gingerbeardman Thank you for trying to help though. I didn't have an actual way to gather some information, but now there may be some way.
Instructions:
planetty 192.168.0.182 > result.log
LaunchELF will use your IP configuration in mc:/SYS-CONF/IPCONFIG.DAT. If you have issues with getting planetty to capture log messages from the PlayStation 2, you may need to disable your firewall (e.g. Windows Firewall).
File | Description |
---|---|
BOOT-USBTEST.ELF | Test program for the PlayStation 2 console |
planetty | Terminal client for Linux |
planetty.exe | Terminal client for Windows |
src (in other archive) | Source code for the tools |
Binaries: https://www.sendspace.com/file/jgxnhv Source code: https://www.sendspace.com/file/1uqywo
This program, planetty.exe doesn't work with my pc (Windows 10 - 64-bit)
Can you elaborate on the problem? This does work on Windows 7 x64. If you are missing a DLL, you probably have to install the Microsoft Visual C++ 2010 Redistributable package (x86).
I'll try this soon
This is what happens: https://www.youtube.com/watch?v=7aZkl12Bvsk I tried compatibility otions too, but the result is always the same
You need to run it from the command prompt. If you need some help with using the command prompt:
Replace the 192.168.0.182 with your PlayStation 2's IP address. You can also do it elsewhere, other than C:\test. But I chose C:\test for simplicity.
It keeps giving me problems… The program must start even when I'm not connected to the Ps2 right??
P.S. I firstly tried as you suggest, writing "planetty", then I tried "planetty.exe" but it isn't the problem P.P.S. Doing tests I noticed that the results.log file was created in the test folder. I moved it and remade all tests to see wich made the log file but I'm not successful anymore… And however it gave me the error message anytime. Can you post a screen of your correct command prompt??
Did you extract planetty.exe to C:\test? I have no sample output for you because I have no PS2s. You can see what was output for another project (when output is not redirected) from this post:http://www.psx-place.com/threads/open-ps2-loader-v0-9-3.13415/page-7#post-141096
@sp193 you have no PS2!? Amazing.
I'll test this today.
USBHDFSD has a sector cache, which covers the whole disk in units of 4096-byte blocks. This is aligned with the start of the disk and has no relationship with any filesystem on the disk.
As such, if the code allocates a block before writing without reading the old content, it is perhaps possible for the remainder of the block to become lost.
For example: | CLUSTER 1 | CLUSTER 2|... ..| BLK1 | BLK2 | BLK3 |....
Where CLUSTER 1 and CLUSTER 2 are logical clusters in a filesystem, while BLK1, BLK2 and BLK3 are the blocks within the cache. The clusters are larger (for example, 32KB each) than the blocks (4KB each). The blocks are not aligned with the clusters (perhaps because there is 1KB alignment).
The software decides to allocate cluster 2, whose corresponding cache block contains some sectors that belongs to cluster 1. As the block is allocated and the old data being totally discarded, some of the sectors that belong to cluster 1 are lost.
The affected lines within fat_write.c concern direntry manipulation, so the user may observe corruption of FAT when:
Line 1274: https://github.com/ps2dev/ps2sdk/blob/master/iop/usb/usbhdfsd/src/fat_write.c#L1274 Line 1308: https://github.com/ps2dev/ps2sdk/blob/master/iop/usb/usbhdfsd/src/fat_write.c#L1308
Personally, I have made a similar mistake before in 2014, which occasionally resulted in corruption when data is written: https://github.com/ps2dev/ps2sdk/commit/a9494c2dc8efa9430b17f05f9891b56194229c53#diff-ba7f9373100b558565bd2d05edac0ee8 So I do think it is a plausible design flaw within the FAT driver.
I shall make a branch and offer a prototype, for more individuals to test with.
EDIT: Custom USBHDFSD module that does not use scache_allocSector in fat_write.c: https://www.sendspace.com/file/blz5os Branch: https://github.com/sp193/ps2sdk/tree/usbhdfsd-scache-noalloc
One can test whether it is an improvement, by copying usbhdfsd-noscachealloc.irx as USBHDFSD.IRX, and to get LaunchELF to use it.