raehik / procfw

Automatically exported from code.google.com/p/procfw
0 stars 0 forks source link

Support LZ4 compression in Virtual ISO mounting #225

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
LZO[1] is a fast compression library, which requires no additional memory for 
decompression, nor changes the decompression speed when higher compression 
settings are used.

I think using LZO should allow fast caching, as sectors could probably be 
loaded into memory, remaining compressed until needed. And will probably reduce 
power usage as required by CSO

[1]http://www.oberhumer.com/opensource/lzo/

Original issue reported on code.google.com by hastur...@gmail.com on 22 Aug 2011 at 10:45

GoogleCodeExporter commented 9 years ago
Sounds interesting. Would be cool if someone could look into it.
For me, I only use compression (usually lvl 1 thres 01) because I want to get 
rid of the (nulled) update files and padding (around 100MB on new games). A 
whole lot of new games don't even work with any kind of compression, they are 
horribly lagging then.

Original comment by catnip...@gmail.com on 22 Aug 2011 at 5:54

GoogleCodeExporter commented 9 years ago
I already implemented this on my fork but i don't see significative changes in 
the reading speed, perhaps you could provide a benchmark to see how much it 
changes when compared to cso so we could get this merged with the official 
source.

The source code is in my clone repository [1] and a simple tool to convert iso 
to zso (compressed iso using lzo) is in the contrib directory. You can filter 
the commits by author to find my specific patches to isoread and vsh.

[1] http://code.google.com/r/codestation404-procfw/source/list

Original comment by codestat...@gmail.com on 25 Aug 2011 at 8:26

GoogleCodeExporter commented 9 years ago
Here is a patch who applies cleanly over the current procfw repo HEAD and 
enables lzo support, is better that read through my clone repo for the changes.

Original comment by codestat...@gmail.com on 25 Aug 2011 at 8:57

Attachments:

GoogleCodeExporter commented 9 years ago
As far as I can tell, QuickLZ isn't the same as LZO, seems to be beaten by 
about a second at decompression speed on this test (as represented by LZOP): 
http://www.maximumcompression.com/data/summary_mf4.php

Original comment by hastur...@gmail.com on 27 Aug 2011 at 9:09

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Its the same (see QuickLZ ith mode 3, it has 4.7 secs too). I also made the 
tests with minilzo and it was slower in decompression than QuickLZ by 1-2 secs. 
Anyway, seeing my patches is trivial to replace the quicklz calls with the 
minilzo ones. I will do it if you are interested.

IMHO, the the other big problem with iso compression is the size of the 
compressed block size at 2KiB, anybody knows whats the block size of the 
NPUMDIMG files? or are they using a dynamic block size?

Original comment by codestat...@gmail.com on 27 Aug 2011 at 2:39

GoogleCodeExporter commented 9 years ago
honestly this will be a great addition to the project. I know this is _super_ 
old but I've actually _stopped_ using compressed ISOs because it _increases_ 
the load times for games! By at least 1-3 seconds for me. So having a faster 
compressor would be amazing. Since it's still open I thought I'd add that the 
currently compressor(gzip or is it zlib... either way), is by all means "good" 
but as far as playing games they load _faster_ with no compression at all than 
even with the lowest level of compression.

Anyway this would be nice to see. Also there's a new compression algorithm out 
there called lz4 which is _even_ faster than lzo and that might be an 
interesting thing to include as a possibility.

Original comment by 133794...@gmail.com on 21 Mar 2013 at 6:08

GoogleCodeExporter commented 9 years ago
I can't edit my previous comment but lzo's much better than it used to be. It's 
~2-3x faster than what it was back when this bug was originally opened so it 
should be a much better compression library to implement. and I'd love anyone 
who did it long time, since I'd love to have compressed isos, that don't take 
longer to load.

Original comment by 133794...@gmail.com on 21 Mar 2013 at 6:22

GoogleCodeExporter commented 9 years ago
If I good recall, CSO uses of some hardware acceleration and that is why it is 
the only implemented as of now.

Because of the great LZO progresses, I think it's a good idea to implement it.

Original comment by devnonam...@gmail.com on 25 Aug 2013 at 11:37

GoogleCodeExporter commented 9 years ago
Ah OK, well that makes sense then, I don't believe that the lz4 guy is doing 
any sort of assembly for mips specifically so that's definitely not something 
that I imagine changing as it's just straight up C. He's also made the "HC" 
encoder better too, and using that for the CSO would be nice in the ciso.py but 
it won't be able to just be a python file anymore. He provides a CLI for people 
to use and I imagine that could be put up there for people to do or someone to 
just compile for those systems.

Btw, LZ4 HC uses the _same_ decoder as LZ4 stock so it's gotten way better. 
Also if I could've I would've edited this to say "lz4" as I found out about it 
way too late.

LZ4 has _already_ been used on a psp game for the system with great 
effect/performance(according to a minis developer) so that's the one I'd shoot 
for as LZO hasn't been updated in a long while. I do believe I meant to say lz4 
but I had said lzo at the top there.

But since someone's already gotten the support in there LZO would be much 
better than what we currently have which is crusty old deflate.

The lz4 is also clean C, so I'd like to see that moreso if you'd be willing to 
put it in there.

Btw this is _most_ current LZO vs LZ4(semi-most current a couple of months old 
but not likely much differnece)
             Ratio   Compress  Decompress both in MB/s
LZ4 (r101)      2.084    422    1820
LZO 2.06        2.106    414     600

https://code.google.com/p/lz4/

that's a link to his google code page.

and here is LZ4 HC vs default compression level for zlib.

LZ4 HC (r101)   2.720     25    2080
zlib 1.2.8 -6   3.099     21     300

Way way way faster at decompression which is where this counts as the CISO 
driver _won't_ be compressing data, it'll be _decompressing it_

Original comment by 133794...@gmail.com on 25 Aug 2013 at 11:47

GoogleCodeExporter commented 9 years ago
I'm still wondering if LZ4 or LZ4 HC should implemented...
LZ4 HC has a much better ratio and slightly better inflating speed while it is 
15x slower to compress.
I think LZ4 HC is still better overall

Original comment by devnonam...@gmail.com on 26 Aug 2013 at 12:07

GoogleCodeExporter commented 9 years ago
I think that the overhead with the cso format is in the block size, 2K is too 
small for some games and the psp loses too much time in reading the compressed 
blocks and adding them to the internal list. I think that a good approach is to 
modify procfw to don't ignore the cso block size so bigger ones can be used.

Also i think that is a good idea to add an option to generate log file with all 
the reads and size of these (profiling the reads of the game). This file could 
be used to optionally feed the compressor so it can choose the optimal block 
size for different parts of the game while compressing the iso. This can be 
translated in fewer reads and bigger decompressed blocks in the iso cache.

I cannot find some documentation for lz4, but it supports overlapped 
decompression? Lzo supports it so one doesn't need to allocate an extra buffer 
to hold the decompressed output (very important in kernel mode since memory is 
so limited).

Btw, compress time should be a non issue since is only done once and outside of 
the psp.

Original comment by codestat...@gmail.com on 26 Aug 2013 at 12:38

GoogleCodeExporter commented 9 years ago
lz4 _is_ memoryless in decompression. As in it needs _no_ extra memory besides 
the block it is currently decompressing. I'm 99.9% sure that that's hwo it's 
set up, you can ask the main develper himself, I was talking with him about 
using it in another way that was on a mips cpu with 32mb of ram _total_ and was 
worried about the memory required for decompression and he told me that 
decompression requires _no_ memory at all.

Also lz4 and lz4 HC use the _same_ decompressor. That means you have _one_ 
decompressor they use the _same_ algorithm. It's more like zlib level 1 vs zlib 
level 6 it's the same exact algorithm but with a different amount of CPU time 
spent on it and also the dictionary(I'm 99.9% sure).

So thus it just decompresses on the fly, it says "here's the data" and that's 
it. it decompresses as it goes in the default one, you can definitely ask him 
if you want to be sure I'm not 100% on this as I've not dug into it too much 
yet.

I was also talking to him about the memory used b/c the mips cpu doesn't have 
an l2 or an l3 and thus memory has to be stolen from the main system, and his 
block sizes should allow you to hold them _entirely_ within the l1 dcache. 

Now then let me just say though, that I _do not_ know the exact amount of 
icache/dcache on l1 for the specific mips cpu on the psp. But I do know that he 
got into an argument with someone who was on that project about the memory/cpu 
time used. If you'd like I can link to that dicussion as it's also on something 
where every single byte is required.

I was talking with that developer about using it for compressed ROM files, 
since zlib you'd have to either a) decompress the whole thing into memory, or 
well that's it. And paging out he memory is hard and weird. But he made it 
clear that it's perfectly fine.

Finally, the developer I was talking about was 1000 tiny claws, the algorithm 
is on the BSD license and hte developer specificaly said how glad they were 
that the thing was memory-less. Since I'm 99.9% sure that they were making 
_sure_ that it ran on psp-1k models which means that 24MB is it.

Also lz4hc is _way_ better at this kind of thing as it's what he designed it 
for. Well to be honest, he designed LZ4HC for packet compression and is still 
working on inter-block compression to make it compress better as it holds no 
"state" at all. Each block is its own thing. It throws away the dictionary etc 
after compressing each block. He is working on making lz4hc during 
_compression_ allow that inter-blockness and keep the dictionary there for 
streaming and it'll be on as "not by default" for the forseeable future.

To summarize it all up, the decompressor holds no state at all, you get 
whatever data comes out of it. If you specify "I want block 1 through 3". It 
gives you block 1-3. It just hands you back them to however you were wanting to 
use them as each block throws away everything after the compression. Also I 
don't know what kind of block size you'd want to do, it does up to 20MB but I'm 
sure you guys would go with a smaller blocksize. They seem to allow 64k-4MB. I 
guess that 20 was made up in my head. either way I'm sure this'd _greatly_ help 
the performance of that system completely.

Once again I'd definitely talk with the author about any of the innerworkings 
of the algorithm b/c he knows _way_ more about it with me. I'll drop his 
twitter handle below here so you can give him a message there, or open an issue 
on his google code as he's very responsive about issues.

https://twitter.com/Cyan4973

that's his twitter message. The developer that I was talking about was porting 
GPsp to another mips architecture and had improved it and had to worry about 
the total 64MB(probably lower than that do to kernel memory used) but either 
way he said after digging into the code itself, that he had no qualms about his 
early claims about suboptimal performance on mips due to the lack of l2/l3.

Original comment by 133794...@gmail.com on 26 Aug 2013 at 12:56

GoogleCodeExporter commented 9 years ago
About implementing it in ciso.py, this is possible as there is a bridge for it 
here : https://pypi.python.org/pypi/lz4
However, it still requires us to compile the C files

Original comment by devnonam...@gmail.com on 26 Aug 2013 at 11:28

GoogleCodeExporter commented 9 years ago
Ah OK that looks to be really good then, I thought there was a python binding 
but I wasn't completely sure. Also I don't know how complex homebrew is on the 
PSP but since I don't know how many people use ciso.py(I don't know how many 
have python installed to use the cli), you guys could look at making a homebrew 
program on the psp so that people can compress/decompress isos/ciso(some other 
name for the lz4 based ones) so that it can be more widely used.

Also since you're going to be changing the block size/sector sized used by 
ciso, I'd suggest maybe storing at the top of the file values showing where the 
original blocks/sectors were stored.

It'd be something like the following, I don't know how contentious memory is 
for the kernel space, so whilst I think 4byte values would be way better due to 
the original ISO sector size being _only_ 2k I can understand if you can't use 
that size of value.

Instead of having to store 2 values eg 
compressed_block_number,original_blocks_held, it'd just be compressed_blocks 
held. You'd know what the compressed block is by looking at _where_ it is in 
the array. index 0 is the first, index 1 is the next etc etc.

So for the first one it'd be lba 0-32/whatever, and the next 33/whatever+1 etc 
etc. So when a game says "hey I need lba x to y" you can more easily seek it 
out. Now if you _already_ have such a system in place then you shouldn't need 
this system there I am unsure to how you've got it setup currently in there.

Also here's an ISO that I just got done redumping using lz4hc vs gzip(not using 
the other zip archiver as I'm unsure if it uses zlib or not)

Final Fantasy 7 Crisis Core 1.7GB uncompressed(1716713472)
lz4hc 3.508s compressed to 1.2GB(1229536276)
gzip -6  58.053s compressed to 1.1GB(1095743996)

lz4 decompressed in 1.653s
gzip decompressed in 12.370s

So as you can see lz4hc is _slightly_ slower in compressedion but is way way 
faster at both compression and decompression. This was _all_ done off of a 
ramdisk but anyway yeah. I just wanted to put that there, I don't know why 
really. But anyway I think that list of blocks used would be a good "hack" to 
make loading even better so you don't have to seek for stuff as lz4 doesn't do 
_any_ kind of structures at all. It just gives you compressed data with a hash 
appended to each block(4byte not crc) that's it. No other structure data at all 
as far as I'm aware so it's like raw_deflate.

Original comment by 133794...@gmail.com on 26 Aug 2013 at 2:42

GoogleCodeExporter commented 9 years ago
Also here, if you guys can, I'd keep a good half of the "iso cache" as 
compressed data after you've read it off of the memory stick since lz4hc 
decompresses so bloedly fast. I can imagine that it could even be faster than 
reading raw data off of the MS(maybe even with the MS speedup hack) and thus 
you could store more data in memory though I don't know how much good that'd be 
to be honest, I do know that it works well for the databases out there, and the 
kernel etc where they just store the compressed page in memory and evict the 
uncompressed one when memory gets tight.

Original comment by 133794...@gmail.com on 26 Aug 2013 at 2:49

GoogleCodeExporter commented 9 years ago
OK I've looked at the source code for lz4 and talked to the guy. Basically all 
you need is a buffer to store the output for that block. So all you need is the 
following. Memory for the compressed block and memory for the uncompressed 
block. Lz4 also doesn't use the same dictionary across blocks of data each one 
is unique. So you don't have to keep that dictionary across any blocks.

It does allow you to use the same exact dictionary too if you want to increase 
compression ratio but I imagine that you'll have to keep that memory there and 
if you're going to do such a thing seeking through it would be way way harder. 
How much memory do you guys have in the kernel atm by the way?

Original comment by 133794...@gmail.com on 22 Nov 2013 at 12:04

GoogleCodeExporter commented 9 years ago
OK, I've sat down and looked at lz4 and here's the latest numbers. For the 
compression of a complete ISO this is on the _maximum_ mode the only thing I've 
clawed back is the block size(to reduce memory used).

Here's how it ended up.

The raw ISO is 428.4MB
lz4 "high compression" with block size of 64KB, it ended up at 416MB.
CISO with it's current one is ~421MB.

Next up, the total memory used for lz4 in the block size of 4(lowest possible).

The peak memory used when decompressing it _all_ was ~688KB during 
decompression. I don't _yet_ have lzo to test itself.

When using lzo the peak memory usage on the best compression level ala lz4 hc 
it uses ~888KB of total memory. A whole 200KB _more_ than lz4. So your "it 
doesn't use any more memory" thing doesn't seem to go valid with me.

P.S. If you're talking about reusing the same dictionary from block to block 
then that _does_ require _more_ memory. It almost doubles the total _peak_ ram 
that is used. And also by the way this is the _entire_ file so if you're only 
decompressing let's say 128KB then you're obviously not going to use that much 
memory. The file size difference between lzo and lz4 is ~200KB on teh highest 
level. When you do interblock dependency(which I'm sure would make seeking 
across the CSO a lot harder since right  now you can just start using the 
thing). It becomes ~1MB smaller. So either way I hope that this shows you that 

Also about the interblock compression thing, it essentially means that it keeps 
the compression dictionary across blocks instead of throwing it away after each 
of the blocks. So that's all that I wanted to say about it. If you increase the 
overall block size of the files/iso driver to 64k that'd obviously increase 
compression capability a very very large amount. I'm no python programmer so I 
can't tell you how much better it'd get but I hope this shows how well lz4 for 
is in memory constrained situations.

Original comment by 133794...@gmail.com on 23 Nov 2013 at 5:37

GoogleCodeExporter commented 9 years ago
I'm doing tests on lz4 on a mips platform but from the source himelf.

 Yann Collet ‏@Cyan4973 5 Dec

@133794m3r LZ4 doesn't use any temp buffer. It's straight from the source 
buffer to the output one.

So it _doesn't_ require extra memory during decompression even in the 
inter-block dependency mode for lz4hc. On a mips platform I have it's ~2-10x as 
fast as gzip(level 6 which is similar to zlib level 6)

Original comment by 133794...@gmail.com on 6 Dec 2013 at 10:57

GoogleCodeExporter commented 9 years ago
That's with interblock compression as in reusing the dictionary across blocks. 
One thing to warn you if you're doing that in zlib or anything else, you're 
going to lose the ability to seek to a random block within the data. So long as 
you do the standard mode(lzo/lz4) with a decent block size you'll be OK. Also 
lz4 tells you the decompressed size of each block. I don't remember the exact 
function off hand but I know the data is there(to help you figure out how much 
memory to allocate).

If you're doing lz4hc for the blocks, you can read them in, and then 
immediately put them into the output buffer. tehre's no temp buffer. So at 
maximum it'd take ~130KB(peak memory usage) for the smallest block size(what I 
suggest you do). So that's the peak total memory that you'd need. It'd be 
compressed block+uncompressed block. If you up the block size to ~64k(which is 
going to make it a ton better by the way) then you can just get the sectors 
from the game themselves and just do it as you can/want to.

Original comment by 133794...@gmail.com on 6 Dec 2013 at 11:06

GoogleCodeExporter commented 9 years ago
For more proof how low memory lz4 can be. Here is an example of the 
decompressor running on an Apple IIgs.

http://www.brutaldeluxe.fr/products/crossdevtools/lz4/index.html

Original comment by 133794...@gmail.com on 10 Dec 2013 at 11:55

GoogleCodeExporter commented 9 years ago
I started working on it, it seems like we are going to have some trouble 
getting the sector size but hopefully it'll work.

Original comment by devnonam...@gmail.com on 14 Dec 2013 at 12:16

GoogleCodeExporter commented 9 years ago

Original comment by devnonam...@gmail.com on 14 Dec 2013 at 12:17

GoogleCodeExporter commented 9 years ago
About the sector size, you may just have to end up doing just 4KB or something 
similar. Since that's not _too_ much larger and is more akin to what most 
devices use and I know that it does well with it. If you're working on it. I'd 
do the HC mode for the lz4 compressor as that's much much more akin to zlib 
level compression but also decompresses at the same speed as stock lz4(very 
very close).

http://133794m3r.github.io/

That link above is where I did some tests on another mips based device(more 
recent ISA), and also did tests on the smallest possible block size in terms of 
compression time. It was with the 64KB block size.

So yeah I see why the sector size thing would cause issues, and you'll also 
likely have to somehow store a set list of which LBAs are stored within the 
compressed blocks so that you have to seek less throughout the thing. The 
overall compression seems to be about the same as zlib with 4KB sectors(or so 
ays the kernel guys).

I eagerly await the updates on it. Thanks for doing the work.

Original comment by 133794...@gmail.com on 14 Dec 2013 at 2:16

GoogleCodeExporter commented 9 years ago
Actually I will only implement the lz4 decompresser because it is also able
to decompress lz4hc stuff (and as you can see, lz4hc doesn't even feature a
decompresser).

About the compresser, maybe we will have something like ciso.py which will
include the compresser (probably not in Python because that would require
me to port the compresser and I think this is a bad idea because it get
updated, it will be hard to maintain).

The code will probably need some refactoring later because the current ISO
drivers are made to only support CSOs (in the code structure).

Original comment by devnonam...@gmail.com on 14 Dec 2013 at 11:25

GoogleCodeExporter commented 9 years ago
Ah OK, I forgot about that. And about the tools to do it. I'm going to look 
into trying to write up some code to do the CISO tool to do lz4. You may want 
to change the CSO header to include something different to make sure that lower 
versions of procfw/others won't be trying to open up the file needlessly.

Also depending on the block size, if it's more than let's say 4KB, you may want 
nto include at the front a list of values. 2 values both 32bit numbers.

It'd be something like the following, for the first LBA it'd be 0 to the LBA 
number.
so. And then it'd be the ending byte for that compressed block.

32:14421

and it'd keep on going to keep from trying to seek randomly throughout all of 
the CSO.

Once you figure out how you want to do it, I can whip up a command line program 
for linux/windows. I don't have any access to a mac machine and I don't know 
about cross-compiling it to that platform and making sure that it works. The 
current python program probably could be able to do it. The only thing is that 
some people who don't have python couldn't use it. So I'm going to try to write 
up a quick program that's commandline based to do the compression of the ISOs 
using the block size that you've selected. once you figure it out, update this 
bug document or whatever so that I can know what to work with.

Original comment by 133794...@gmail.com on 14 Dec 2013 at 10:16

GoogleCodeExporter commented 9 years ago
Took the bullet and made an LZ4 implementation of the cso format and added 
support for it in PRO. I also modified the ciso.py tool to be able to 
compress/decompress zso images (LZ4 compressed isos).

To enable LZ4 compression pass -z to the ciso.py command-line and include a 
compression value between 1-8. If you pass a 9 then LZ4 HC will be used instead 
for the compression (slower to compress but gets a slightly better ratio).

Very simple LZ4 decompression patches are added on vshctrl and the 
galaxy/inferno drivers. Also vshctrl is patched so the .zso files can be 
recognized at the XMB.

On my tests i didn't noticed much difference compared to LZO compression so i 
started my investigation on why the cso format was so slow to read and ended 
making a big optimization for it (only on the inferno driver). Gonna try 
explaining it below:

The current cso decompression method tries to read and decompress every gzip 
compressed block one by one. For example a 80KiB read needs 40 sceIoRead reads 
of 2KiB for the blocks alone (plus any cso index reads that it needs). This 
excessive access to the memory stick slows down the whole cso reading making 
the gzip decompressión time not important.

On my method i reduce the total I/O to a max of 4 reads: one for the index and 
a max of three for the compressed blocks, that i read in one go. If the block 
and requested size if aligned to the ISO sector then it only needs a total of 
two reads to the memory stick.

On my tests i managed to play GTA Liberty city stories from a CSO compressed at 
level 9 (cpu at 333 and ms access speedup enabled) without virtually any lag 
whatsoever. Also tested other 2 compressed games with similar results.

I leave this patch so it can be reviewed for possible implementation bugs and 
hopefully can be merged on procfw.

Original comment by codestat...@gmail.com on 16 Apr 2014 at 12:10

Attachments:

GoogleCodeExporter commented 9 years ago
Patch updated, now it maintains a partial cache of the index table so doesn't 
need to be loaded on every read.

Got performance improvements on GoW: GoS (prologue video) compared to the first 
patch.

Original comment by codestat...@gmail.com on 17 Apr 2014 at 7:16

GoogleCodeExporter commented 9 years ago
Sorry, wrong patch attached.

Original comment by codestat...@gmail.com on 17 Apr 2014 at 10:46

Attachments:

GoogleCodeExporter commented 9 years ago
New patch. This solves the problem with Jeckpack Joyride (tries to read beyond 
the iso). The updated code now manages the corner cases of reading the last 
block, reading with an invalid offset (returns 0) and size going beyond the iso 
size (the size gets readjusted).

Original comment by codestat...@gmail.com on 14 May 2014 at 3:33

Attachments:

GoogleCodeExporter commented 9 years ago
sorry for all of the not posting stuff. I'm currently getting reayd to finally 
start to try to test this thing with various games. I was busy and dealing with 
horrible news. Not going to repeat here, but it kinda took my life for a very 
long time. Anyway thanks for the updates and I'm going to build procfw and test 
it on my psp with various games.

Do you know of any that are partciularly difficult to run? For example I have 
quite a few of the psn releases that were pkgs which I made into isos would 
they be likely to find random bugs within the new driver?

Original comment by 133794...@gmail.com on 13 Jun 2014 at 9:03

GoogleCodeExporter commented 9 years ago
Hello, thank you for your help.
Before merging this patch, we want to make sure that it doesn't affect the 
compatibility negatively. That's why any game is worth testing.

If possible, please test it when compressed in GZIP and then LZ4, so we can 
make sure that both work properly.

Original comment by devnonam...@gmail.com on 13 Jun 2014 at 9:09

GoogleCodeExporter commented 9 years ago
Ah OK, I didn't know if there were any issues still as you said earlier about 
jetpack joyride _not_ working for some reason and I thought that was to due 
with lz4 in ciso.py

I'm going to be trying out games that I know have been pretty stable for me in 
the past and work completely AOK. There is one game that seems to _never_ work 
for me under procfw-c2 which is the dungeon siege game I get far into the game 
and hten it bugs out and stops working.

I don't know if this is the game itself as I've not seen anyone else have this 
issue in any other site but if you want I can provide the save file for it.

Original comment by 133794...@gmail.com on 13 Jun 2014 at 9:27

GoogleCodeExporter commented 9 years ago
Made a testcase [1] to check that the algorithm is correct and doesn't have 
memory leaks. Compile it with:

gcc csotest.c -o csotest -lz -llz4 -lssl -lcrypto -fstack-protector

Run it with:
./csotest some.cso load.txt

Also run it with:
valgrind ./csotest some.cso load.txt

To make sure that no memory leaks, buffer overflows or any invalid reads/writes 
were done. Made a test with Pool Hall pro ISO/CSO/ZSO (read the offsets/size in 
the load file 500 times.):

ISO (just to have a reference point):
time spent (old method): 1.186666
time spent (new method): 1.174465

CSO:
time spent (old method): 7.853323
time spent (new method): 6.964370

ZSO:
time spent (old method): 2.070728
time spent (new method): 1.377143

So on my PC there is ±0.01 seconds of error margin (according to the ISO read).

[1] https://gist.github.com/codestation/bf1cc67ddf7c490c9626

Original comment by codestat...@gmail.com on 15 Jun 2014 at 2:09

GoogleCodeExporter commented 9 years ago
right here is the tests on my own computer, I did star ocean first departure, 
knights in the nightmare, and puyo pop fever. Each one of varying size and I 
believe time when they were developed. I'm posting as a pastbin link since I 
don't want to fill this thing with a ton of extra stuff.

It looks like they're very similar and I'll saying if valgrind found anything 
below instead of the whole "found nothing" bit. Here's the link to the tests 
without valgrind's data included as that just makes it a ton slower and thus I 
found no reason to actually post thoses ones.

http://pastebin.com/raw.php?i=yBMgtTjd

ok there's zero memory leaks and I tried three games, with a total of 9 tests 
in normal and also valgrind and it seems to be AOK as stated all was done on a 
ramdisk so it should be fine.

Original comment by 133794...@gmail.com on 16 Jun 2014 at 1:41

GoogleCodeExporter commented 9 years ago
OK, you can maybe call me crazy but I am not seeing the zso files on my memory 
stick. They're simply not listed. I don't know if this is due to a bug in the 
vsh menu or what, but I don't see them there at all and I know that they're 
there, as I pasted them from my tests folder to see how they would play on my 
psp and the files aren't listed in vsh. I don't know if this is normal, or 
what? And anyway yeah have you tried it out yourself with zsos?

By the way is there some sort of logging program so that I can figure out why 
vsh isn't showing my zsos? I did the patch as was required as far as I know, as 
it has references to zsos/lz4 in all of the right places put out via the patch.

Original comment by 133794...@gmail.com on 17 Jun 2014 at 5:06

GoogleCodeExporter commented 9 years ago
Very weird. Are you 100% sure that the installer is writting the vshctrl.prx 
file? If you were already using Pro, did you force the reinstall of the cfw? 
(hold L while installing). Vshctrl is patched to recognize .zso extensions and 
added lz4 routines so it can load the previews on the xmb.

About logging you could add some sceIoWrite statements and use the fd = 1 
(stdout), and capture it with psplink. Better than enabling DEBUG and slowing 
down the whole cfw.

Original comment by codestat...@gmail.com on 17 Jun 2014 at 5:25

GoogleCodeExporter commented 9 years ago
Well I'm on a psp 3k model so I seriously doubt that it'd be doing much of 
anything with any such file in ram unless it is. I'll try to run the proupdate 
to just see if it'll do anything then since I always figured on a 3k just run 
fast recovery and you're good to go as it showed up the csos and also the isos 
whereas the psp ofw would just show broken game for the isos.

Original comment by 133794...@gmail.com on 17 Jun 2014 at 7:40

GoogleCodeExporter commented 9 years ago
it worked I guess I'm just a stupid then, I didn't know that the psp 3k could 
do anything like installing a cfw that or it had to replace it in ram or 
something either way it's now working 100% apparently.

Original comment by 133794...@gmail.com on 17 Jun 2014 at 7:44

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Good, this new patch seems pretty robust.
We'll do some further testing, and then I'll merge it if we don't find any 
issue.

Original comment by devnonam...@gmail.com on 17 Jun 2014 at 7:49

GoogleCodeExporter commented 9 years ago
I'm currently testing the same games on my psp as we speak to see if anything 
seems the different, the one thing that goes along with your changes is that 
the activity light on the memory stick seems to be flasing less. I haven't 
tried dungeon siege yet as that game I don't know what it's doing whilst 
loading because even from the memory stick as an iso it takes almost a minute 
to do it's loading but I'm checking the other ones to see that they all play OK 
then.

P.S. the pastebin with the values of the various games, does that seem to be in 
line with what you saw yourself/the load.txt should I try to use a larger game 
with it as it seems the offsets are in the gigabytes and the largest game I 
know if is FinalFantasy Type-0 which clocks in at 2.5GB

Original comment by 133794...@gmail.com on 17 Jun 2014 at 8:00

GoogleCodeExporter commented 9 years ago
Edit to say, i tried fftype0 with the 500 reads test and the results were 
similar to the other ones and have tried a few games and I'm currently playing 
one without any real issues right now.

Since I can't find anything about it anywhere on the wiki and w/o having to 
learn the source code. What's the cache's number, is that blocks or reads? or 
what exactly. since I'd like to use the iso cache as well as possible. Finally 
is the iso cache's cache the raw blocks still compressed, or are you storing 
them uncompressed?

Original comment by 133794...@gmail.com on 18 Jun 2014 at 3:25

GoogleCodeExporter commented 9 years ago
OK I found an issue with it, when I resume from sleep with a zso I don't 
remember if this happened with csos or not as I have never played this exact 
game before but I have seen it happen from untold legends which I have played 
via an iso. The game responds to some input but then when it tries to do the 
first loading screen I see the activity light like it's reading but it just 
freezes and I have to return to the home menu to restart the game again. I 
don't know what sort of debugging stuff I should do to try to see if the 
program is no longer responding or not. I may end up trying it with plain old 
iso to see if it's the new loader or not.

P.S. IsoCache is at 23MB, LRU, 512. cpu is 333/166. Vsh is 100/50, and ms 
speedup is always. Driver is inferno.

Original comment by 133794...@gmail.com on 18 Jun 2014 at 6:08

GoogleCodeExporter commented 9 years ago
OK it's just that game itself only I tried it with a plain iso and loading up 
the save file and apparently it borked its own save file somehow as I had to go 
back into the previous level any of them to get it going but that seems to be 
it. I've also been testing other games and cannot find any real issues with 
them really. As far as the access lite it seems to be on less than it used to 
be. I'm going to be continuing to test with various other games with the cso 
version also to see if they show anything odd but it seems to be just that one 
game. I've tried playing ~11 games thusfar in zso format for ~30-45min without 
any real issues shown.

Original comment by 133794...@gmail.com on 18 Jun 2014 at 11:06

GoogleCodeExporter commented 9 years ago
Wake from sleep is causing the games to freeze they won't respond to input but 
the psphome button will work. i've disabled all plugins except "noumd.prx" and 
it's still happening. I've even tried to reduce the caches thinking it may just 
be that there's some weird bug in that. I believe this is just lz4 compressed 
files when it does it the only way to fix it is to do a cold reboot. I'm going 
to retry with plain isos and try to do some more testing as I don't know what 
part of the code is responsible for wake from sleep that might help me.

Original comment by 133794...@gmail.com on 27 Jun 2014 at 6:58

GoogleCodeExporter commented 9 years ago
Sorry for the delay, i have been very busy with my job. I reproduced the bug 
and tracked it down to a sceIoread where i wasn't checking the return value. It 
seems that if one attempts to read just after the psp returns from sleep the 
I/O functions return SCE_KERNEL_ERROR_DRIVER_DELETED. I changed the code that 
reads the cso index to use read_raw_data instead of a plain sceIoRead since 
this function makes a handful of read retries before giving up.

This wasn't a problem with the algorithm so it remains without changes. I 
attached a new patch with the changed read method. #46, can you retry your 
tests with the new patch?

Original comment by codestat...@gmail.com on 8 Jul 2014 at 2:39

Attachments:

GoogleCodeExporter commented 9 years ago
I was on the irc trying to get help with debugging it but everyone just kinda 
said 'oh it's probably a plugin that's causing it'. Instead of helping so I 
wasn't able to do more debugging since I had no idea where to start putting the 
printing at and where to look at stuff.

So I'm also sorry for not being able to help you debug it more as I got at a 
dead end and didn't want to fill up this thing with random comments.

I'll download the patch and start testing it again as I said dungeon siege had 
a weird error and it might've been the same thing.

If you know what code I need to look at to put in the printf's to find the 
errors at that'd be great in case I can find another bug on here.

Original comment by 133794...@gmail.com on 8 Jul 2014 at 2:52

GoogleCodeExporter commented 9 years ago
Personally, i put a print statement in the iso_read function (just before the 
read_cso_data_ng call) so i can get the last offset/size that the driver tried 
to read before the crash (thats how i get my load.txt files to test it on the 
pc using the testcase). Just declare a char buf[256] somewhere and use this:

sprintf(buf, "offset: %i, size: %i\n", args->offset, args->size);
sceIoWrite(1, buf, sizeof(buf));

"1" is stdout so it gets printed to psplink (make sure to disable nodrm, it 
interferes with the stdout write, dunno why).

To find this bug i added that print statement after every read_raw_data and 
tagged then differently so i knew in what region it was getting stuck.Also i 
dissasembled inferno.prx and checked where the EPC was located (it crashed 
inside the LZ4 decompress func so i knew that it was a bad read buffer).

Original comment by codestat...@gmail.com on 8 Jul 2014 at 3:05

GoogleCodeExporter commented 9 years ago
That sounds a bit crazy but I'll try to keep it in mind if I can track it down 
again to something around that. This may have also been what was causing 
dungeon siege when doing multiple loads to fail to read.

I don't know if you know of a game that seems to do more reads on the memory 
stick/doing more during loading but it seems to be it for me. Even other games 
by the same company don't take as long/keep the read light on as much on the 
memory stick. I'll recompile and test again.

Also it looks like the psp is using some sort of unix then for it's base os as 
I know stdout is 1 and stderr is 2. I'll try it in the future but when it gets 
to the dissasembling inferno.prx and then debugging it I'd probably get lost 
there.

Finally as far as the psp's compiling options goes I've been using the 
following as I do with all other programs that I know cannot be easily debugged.

-fweb -fgcse-las -fgcse-sm -fgcse-after-reload -fpeel-loops

I know that the first one makes debugging impossible but since I've been using 
it on the gcwzero and on the psp where you can't actually do gdb on the whole 
thing I still use it anyway. As far as perf goes it seems make ~3-5% faster. 
Also I've been looking at gcc and I'm hoping that they finally do the thing 
where they change -O3 to be more akin to -Os where it makes the code try to fit 
within the caches. Also the psp sdk's gcc is there any reason why it has to be 
that old of a version of gcc?

Finally for real, I'll be compiling it in a second and then doing the tests on 
my psp again along with the csotest on a ramdisk and repeatedly doing loads on 
it.

Original comment by 133794...@gmail.com on 8 Jul 2014 at 6:07