file.seek not working - Githubissues

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Try to convert my python file
2. Execute it like $ decoder test test2 (where test is the other attached file

What is the expected output? What do you see instead?
EOFError: not enough items in file

With other test files I see also:
GC Warning: Repeated allocation of very large block (appr. size 33558528):
    May lead to memory leak and poor performance.

What version of the product are you using? On what operating system?
Debian Wheezy (testing) on x86 (32 bit), Python 2.7.2, shedskin from git 
(commit 28a6367c52b40f5810beb82b88d09895c664fc30).

Please provide any additional information below.
It seems that array.fromfile acts different.

Original issue reported on code.google.com by frap...@gmail.com on 10 Mar 2012 at 1:41

Attachments:

GoogleCodeExporter commented 9 years ago

thanks ;-) hmm, it looks like for some reason archive.seek(4) doesn't work.. 
trying to reproduce it in a smaller program.

Original comment by mark.duf...@gmail.com on 10 Mar 2012 at 10:58

Changed title: file.seek not working
Changed state: Accepted

GoogleCodeExporter commented 9 years ago

actually, shedskin appears to skip generating code for this call.. this is 
going to be embarrassing.

Original comment by mark.duf...@gmail.com on 10 Mar 2012 at 11:02

GoogleCodeExporter commented 9 years ago

okay, fixed in git. thanks again! ;-) 

leaving this issue open because I want to make sure there are no similar 
problems..

Original comment by mark.duf...@gmail.com on 10 Mar 2012 at 11:55

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Thank you for your work and for your friendliness :) Now it works.

Original comment by frap...@gmail.com on 10 Mar 2012 at 5:22

GoogleCodeExporter commented 9 years ago

about the GC problem: this is a nasty problem that occurs sometimes 
unfortunately with shedskin. I'm hoping that as everyone moves to 64-bit the 
problem will mostly go away (conservative garbage collection is easier on 
64-bit, because arbitrary values have less chance of "pointing" into actual 
memory). it may also be that you are using an old version of libgc, or one that 
is not configured correctly. can you see which version you have and how it was 
built..? it may also be that shedskin uses libgc incorrectly here.. or a 
combination of all the above :P if you have a test file that often triggers the 
warning, I could have a look what happens here.

Original comment by mark.duf...@gmail.com on 10 Mar 2012 at 11:02

GoogleCodeExporter commented 9 years ago

libgc version: 7.1 (from: http://packages.debian.org/wheezy/libgc1c2 )
You can see the makefile in the sources, on the right.
I can't get this warning anymore now, I should do more testing around.
Even if it doesn't show it, shedskin version is more than 60 time slower (it 
hasn't terminated yet). I don't know if it could be related with gc issue or 
with my python source code.

Original comment by frap...@gmail.com on 12 Mar 2012 at 10:22

GoogleCodeExporter commented 9 years ago

okay so 7.1 is quite recent, good. but I don't think debian builds it from the 
original Makefile.. we probably need the flags that were sent to 'configure'. 
not sure where to find those. but don't waste too much time on this, the flags 
are probably fine.

do you have a version/large test file for me that is really slow..? I'd be 
happy to have a look..

Original comment by mark.duf...@gmail.com on 19 Mar 2012 at 11:31

GoogleCodeExporter commented 9 years ago

Sure, but I can't upload the whole file, you have to generate it.
1) Download enwik8 from 
http://www.cs.fit.edu/~mmahoney/compression/textdata.html
2) Decompress it
3) Compile http://code.google.com/p/lz4/
4) Use lz4demo32.exe (or lz4demo64.exe) to compress enwik8
5) Decompress it using my decompressor.py or latest version (no array, only 
strings) from https://github.com/lz4kit/lz4kit/blob/master/extra/decompressor.py

Pypy is very slow with the string version, but Python is slower if I use 
arrays. Shedskin seems very slow with both versions.

Original comment by frap...@gmail.com on 19 Mar 2012 at 11:38

GoogleCodeExporter commented 9 years ago

haha, okay, thanks a lot ;-) I will try to have a look on my free day this 
week.. a bit too many things on my plate atm.

Original comment by mark.duf...@gmail.com on 19 Mar 2012 at 11:41

GoogleCodeExporter commented 9 years ago

Eheheh :) Obiously you can use other files, but I use enwik8 because it's quite 
big and it's a text from Wikipedia, so it could be considered as a good test :)

Original comment by frap...@gmail.com on 19 Mar 2012 at 12:56

GoogleCodeExporter commented 9 years ago

ah, I see, array slicing is still extremely slow.. let me try and fix that.

Original comment by mark.duf...@gmail.com on 25 Mar 2012 at 6:20

GoogleCodeExporter commented 9 years ago

alright, optimized array slicing, so after compilation it is now about 2.5 
times faster than cpython (for enwik8). now let's see if I can optimize the 
python code for shedskin a bit..

Original comment by mark.duf...@gmail.com on 25 Mar 2012 at 7:12

GoogleCodeExporter commented 9 years ago

after avoiding repeated array allocations and slicing (and using shedskin -bw), 
it becomes about 10 times faster than the original version under cpython here:

def update(target, source, pos, n):
    for i in range(n):
    target.append(source[pos+i])

def process(stream):
    ..
    update(result, stream, cursor, k)
    ..
    pos = len(result)-offset
    while j >= offset:
        update(result, result, pos, offset)
        j -= offset
    update(result, result, pos, j)

in total two problems fixed, I'd say we can close this issue.. not much I can 
do about any GC problem for now.

Original comment by mark.duf...@gmail.com on 25 Mar 2012 at 7:33

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Thank you so much :)

Original comment by frap...@gmail.com on 25 Mar 2012 at 9:56

hg2051 / shedskin

file.seek not working #163