scipy / scipy

SciPy library main repository
https://scipy.org
BSD 3-Clause "New" or "Revised" License
13.08k stars 5.19k forks source link

scipy.io.readsav error on reading sav file #4613

Closed abrahamneben closed 9 years ago

abrahamneben commented 9 years ago

scipy.io.readsav doesn't work many sav files, such as this one https://dl.dropboxusercontent.com/u/22819/1066671952_cal.sav

Python 2.7.3 (default, Jan 13 2013, 09:19:58) [GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

import scipy scipy.version '0.12.0' import scipy.io scipy.io.readsav('1066657560_cal.sav') /usr/local/lib/python2.7/site-packages/scipy/io/idl.py:167: UserWarning: warning: empty strings are now set to '' instead of None warnings.warn("warning: empty strings are now set to '' instead of None") Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/site-packages/scipy/io/idl.py", line 813, in readsav replace, new = _replace_heap(r['data'], heap) File "/usr/local/lib/python2.7/site-packages/scipy/io/idl.py", line 601, in _replace_heap replace, new = _replace_heap(record, heap) File "/usr/local/lib/python2.7/site-packages/scipy/io/idl.py", line 613, in _replace_heap replace, new = _replace_heap(value, heap) File "/usr/local/lib/python2.7/site-packages/scipy/io/idl.py", line 627, in _replace_heap replace, new = _replace_heap(variable.item(iv), heap) File "/usr/local/lib/python2.7/site-packages/scipy/io/idl.py", line 587, in _replace_heap variable = heap[variable.index] KeyError: 543708

abrahamneben commented 9 years ago

This is quite an annoying error. I never know which IDL sav file is going to work...

pv commented 9 years ago

Did you close the ticket by mistake?

abrahamneben commented 9 years ago

Yes, closed by mistake. Now reopened.

rgommers commented 9 years ago

@astrofrog any chance you can look at this?

astrofrog commented 9 years ago

@rgommers - yes, I will take a look at this in a couple of days. In the mean time, @abrahamneben - can you tell me what kind of data this file should contain?

astrofrog commented 9 years ago

@abrahamneben - I managed to read in all the data except the 'CONVERGENCE' column. Is there anything special about these variables on the IDL side?

abrahamneben commented 9 years ago

Really? I get the below error: I'm in python 2.7.3, scipy 0.15.1, numpy 1.10.0, CentOS release 6.3. Not sure if there is something special about the IDL datafile. At the bottom I'll put the IDL help message about the "cal" structure that gets restored from this file.

abrahamn@eor-09:~$ python impoPython 2.7.3 (default, Jan 13 2013, 09:19:58) [GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. rt>>> import scipy sci>>> scipy.version '0.15.1'

a=scipy.io.readsav('1066671952_cal.sav') Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'io' import scipy.io a=scipy.io.readsav('1066671952_cal.sav') Traceback (most recent call last): File "", line 1, in File "/nfs/eor-00/h1/abrahamn/lib/python2.7/site-packages/scipy/io/idl.py", line 821, in readsav replace, new = _replace_heap(r['data'], heap) File "/nfs/eor-00/h1/abrahamn/lib/python2.7/site-packages/scipy/io/idl.py", line 609, in _replace_heap replace, new = _replace_heap(record, heap) File "/nfs/eor-00/h1/abrahamn/lib/python2.7/site-packages/scipy/io/idl.py", line 621, in _replace_heap replace, new = _replace_heap(value, heap) File "/nfs/eor-00/h1/abrahamn/lib/python2.7/site-packages/scipy/io/idl.py", line 635, in _replace_heap replace, new = _replace_heap(variable.item(iv), heap) File "/nfs/eor-00/h1/abrahamn/lib/python2.7/site-packages/scipy/io/idl.py", line 595, in _replace_heap variable = heap[variable.index] KeyError: 543708

IDL> help,cal \ Structure , 39 tags, length=8119320, data length=8119306, refs=1: N_POL INT 2 N_FREQ LONG 384 N_TILE LONG 128 N_TIME LONG 56 UU FLOAT Array[462336] VV FLOAT Array[462336] SOURCE_LIST STRUCT -> SOURCE_COMPONENT Array[9929] MAX_ITER LONG 100 PHASE_ITER LONG 10 TILE_A LONG Array[462336] TILE_B LONG Array[462336] TILE_NAMES STRING Array[128] BIN_OFFSET LONG Array[56] FREQ FLOAT Array[384] GAIN POINTER Array[2] GAIN_RESIDUAL POINTER Array[2] GALAXY_CAL INT 0 MIN_CAL_BASELINE FLOAT 50.0000 MAX_CAL_BASELINE FLOAT 722.772 N_VIS_CAL LONG 118332385 TIME_AVG INT 1 MIN_SOLNS INT 5 REF_ANTENNA LONG 1 REF_ANTENNA_NAME STRING ' 12' CONV_THRESH FLOAT 1.00000e-06 CONVERGENCE POINTER Array[2] POLYFIT INT 2 AMP_PARAMS POINTER Array[2, 128] PHASE_PARAMS POINTER Array[2, 128] MEAN_GAIN FLOAT Array[2] MEAN_GAIN_RESIDUAL FLOAT Array[2] MEAN_GAIN_RESTRICT FLOAT Array[2] STDDEV_GAIN_RESIDUAL FLOAT Array[2] BANDPASS INT 1 MODE_FIT FLOAT 1.00000 MODE_PARAMS POINTER Array[2, 128] CAL_ORIGIN STRING '1066671952' N_CAL_SRC INT 9929 CATALOG_NAME STRING 'mwa_calibration_source_BenMcKinley_fornax_and_VLA_pic_halfpixeloffset' IDL>

astrofrog commented 9 years ago

@abrahamneben - thanks for the info! Just to clarify, I can only read it in (except for that column) locally after hacking around, but I still have the same issue as you with the stable SciPy.

astrofrog commented 9 years ago

@abrahamneben - could you show me the contents of the CONVERGENCE component?

abrahamneben commented 9 years ago

"convergence" seems to be a length 2 pointer array, but both elements are null pointers. That seems likely to be the problem.

IDL> print,cal.convergence < NullPointer > < NullPointer >

abrahamneben commented 9 years ago

An if statement that lets readsav fail just on null variables would be a nice solution here.

astrofrog commented 9 years ago

@abrahamneben - ok, thanks! My plan at the moment is to add a condition that if a pointer is not pointing to a valid heap variable, then we just return None for those specific values. Does that sounds reasonable? (edit: seems like you are thinking along the same lines!)

abrahamneben commented 9 years ago

@astrofrog nice!

astrofrog commented 9 years ago

I will do a PR tomorrow - thanks for the report!

astrofrog commented 9 years ago

@abrahamneben - would it be easy for you to produce a SAV file that contains JUST the CONVERGENCE variable? If so, I could include that in a regression test in #4710.

abrahamneben commented 9 years ago

@astrofrog Here's an IDL sav file that contains that array of two null pointers stored as the variable "a". https://dl.dropboxusercontent.com/u/22819/idl_array_of_null_pointers.sav

astrofrog commented 9 years ago

@abrahamneben - just for info, this is now fixed in the developer version of SciPy. Thanks for reporting this and for providing a test file!

rgommers commented 9 years ago

not fixed --> now fixed :)

astrofrog commented 9 years ago

@rgommers - thanks, clearly need more :coffee: :smile:

huayue21 commented 3 years ago

Hi do we need to explicitly close the file after the readsav operation? And how?

Thanks,

rgommers commented 3 years ago

No, it's closed within the function.