thouis / numpy-trac-migration

numpy Trac to github issues migration
2 stars 3 forks source link

Mixing regular IO with numpy.fromfile confuses file offset (Trac #2210) #5999

Open numpy-gitbot opened 12 years ago

numpy-gitbot commented 12 years ago

Original ticket http://projects.scipy.org/numpy/ticket/2210 on 2012-09-04 by trac user allen@transpireinc.com, assigned to unknown.

I have a binary file which is written by a C program. It is essentially a bunch of integer and single precision floating point values written one after another. I'm trying to read the file partly with the python read() function (usually followed by struct.unpack()) and partly with numpy.fromfile(). Generally, I'm extracting the scalars with read()/unpack() and the arrays with fromfile(). I've discovered that fromfile() can become confused if the file itself is larger than a particular size. On my Red Hat Enterprise Linux 6.3 and Ubuntu 12.04 64-bit systems, this size is 4096 bytes. It appears to work OK on windows xp regardless of the file size.

I attached a simple program which writes a simple binary file and then reads it back. It should produce the output:

offset0: 125
[-70. -65. -60. -55. -50. -45. -40. -35. -30. -25. -20. -15. -10.  -5.   0.
   5.  10.  15.  20.  25.  30.  35.  40.  45.  50.  55.  60.  65.  70.]
offset1: 241
[-80. -75. -70. -65. -60. -55. -50. -45. -40. -35. -30. -25. -20. -15. -10.
  -5.   0.   5.  10.  15.  20.  25.  30.  35.  40.  45.  50.  55.  60.]
offset2: 357
[-90. -85. -80. -75. -70. -65. -60. -55. -50. -45. -40. -35. -30. -25. -20.
 -15. -10.  -5.   0.   5.  10.  15.  20.  25.  30.  35.  40.  45.  50.]
offset3: 473

On linux I get:

offset0: 125
[-70. -65. -60. -55. -50. -45. -40. -35. -30. -25. -20. -15. -10.  -5.   0.
   5.  10.  15.  20.  25.  30.  35.  40.  45.  50.  55.  60.  65.  70.]
offset1: 242
[  1.78734834e-38   1.78698961e-38   1.78663088e-38   1.78627215e-38
   1.78562643e-38   1.78490896e-38   1.78419150e-38   1.78347403e-38
   1.78275657e-38   1.78203910e-38   1.78103465e-38   1.77959972e-38
   1.77816479e-38   1.77644288e-38   1.77357302e-38   1.76898124e-38
   0.00000000e+00   5.93486894e-39   5.98078669e-39   6.00948528e-39
   6.02670444e-39   6.04105373e-39   6.05540303e-39   6.06544754e-39
   6.07262218e-39   6.07979683e-39   6.08697148e-39   6.09414613e-39
   6.10132078e-39]
offset2: 358
[  1.78806581e-38   1.78770708e-38   1.78734834e-38   1.78698961e-38
   1.78663088e-38   1.78627215e-38   1.78562643e-38   1.78490896e-38
   1.78419150e-38   1.78347403e-38   1.78275657e-38   1.78203910e-38
   1.78103465e-38   1.77959972e-38   1.77816479e-38   1.77644288e-38
   1.77357302e-38   1.76898124e-38   0.00000000e+00   5.93486894e-39
   5.98078669e-39   6.00948528e-39   6.02670444e-39   6.04105373e-39
   6.05540303e-39   6.06544754e-39   6.07262218e-39   6.07979683e-39
   6.08697148e-39]
offset3: 474

You see that the first array is read OK, but the file offset1 following the fromfile() call is incorrect. It should be 241, but is 242 instead.

I glanced at the C code which implements fromfile() but I didn't see anything obviously incorrect. Except, it does make a copy of the underlying file handle to do the fromfile(). I wondered if this was exposing a bug in GLIBC or the python file handling layer.

Thanks, Allen

numpy-gitbot commented 12 years ago

Attachment added by trac user allenatmention:transpireinc.com on 2012-09-04: example.py

numpy-gitbot commented 12 years ago

trac user allenatmention:transpireinc.com wrote on 2012-09-05

Note: After reading some of the Python3 documentation, I discovered that setting the buffering to 0 in the open() statement allows my test program to run correctly.