radio-astro-tools / casa-formats-io

Code to handle I/O from/to data in CASA format
Other
10 stars 7 forks source link

Read IERS tables #13

Closed keflavich closed 3 years ago

keflavich commented 3 years ago

CASA comes with IERS tables, e.g. /Applications/CASA.app/Contents/Resources/casa-data/geodetic/IERSpredict'

Trying to read them with #12 results in:

ct = Table.read('/Applications/CASA.app/Contents/Resources/casa-data/geodetic/IERSpredict', endian='<')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-3-24a5793090a8> in <module>
----> 1 ct = Table.read('/Applications/CASA.app/Contents/Resources/casa-data/geodetic/IERSpredict', endian='<')

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read(cls, filename, endian)
    348                 raise ValueError('Incorrect magic code: {0}'.format(magic))
    349
--> 350             table = cls.read_fileobj(f)
    351
    352         for dm_index, dm in table.column_set.data_managers.items():

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in wrapper(*args)
     80         b = EndianAwareFileHandle(BytesIO(bytes), f.endian, f.original_filename)
     81         if self:
---> 82             result = func(self, b, *args)
     83         else:
     84             result = func(b, *args)

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read_fileobj(cls, f)
    426         # big_endian = fmt == 0  # noqa
    427
--> 428         self.desc = TableDesc.read(f, self.nrow)
    429
    430         self.column_set = ColumnSet.read(f, desc=self.desc)

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in wrapper(*args)
     80         b = EndianAwareFileHandle(BytesIO(bytes), f.endian, f.original_filename)
     81         if self:
---> 82             result = func(self, b, *args)
     83         else:
     84             result = func(b, *args)

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read(cls, f, nrow)
    445         unknown1 = read_int32(f)  # noqa
    446         unknown2 = read_int32(f)  # noqa
--> 447         unknown3 = read_string(f)  # noqa
    448
    449         self.keywords = TableRecord.read(f)

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read_string(f)
    141     bytes1 = f.read(int(value))
    142     bytes2 = bytes1.replace(b'\x00', b'')
--> 143     bytes3 = bytes2.decode('ascii')
    144     return bytes3
    145

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa2 in position 18: ordinal not in range(128)

(note that I expanded out read_string into multiple steps).

Digging into the debugger, I get:

>>> bytes2
b'ict\x01B\x0bTableRecord\x01\xa2\nRecordDesc\x02\x07\tVS_CREATE\x0b\x07VS_DATE\x0b\nVS_VERSION\x0b\x07VS_TYPE\x0b\x0bTAB_VERSION\x0b\x04MJD0\x08\x04dMJD\x08\x01\x102019/12/08/03:00\x102019/12/08/03:00\t0607.0001/IERS Earth Orientation Data predicted from NEOS\t0002.0000@\xec\xad\xc0?\xf05\x0bTableRecord\x01\x1a\nRecordDesc\x02\x01\r\x01\x19ScalarColumnDesc<double  \x01\x03MJD\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01d\x01\x01\x19ScalarColumnDesc<double  \x01\x01x\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double  \x01\x02Dx\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double  \x01\x01y\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double  \x01\x02Dy\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double  \x01\x04dUT1\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01s\x01\x01\x19ScalarColumnDesc<double  \x01\x05DdUT1\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01s\x01\x01\x19ScalarColumnDesc<double  \x01\x03LOD\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01s\x01\x01\x19ScalarColumnDesc<double  \x01\x04DLOD\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01s\x01\x01\x19ScalarColumnDesc<double  \x01\x04dPsi\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double  \x01\x05DdPsi\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double  \x01\x04dEps\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double  \x01\x05DdEps\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01'

so this looks like a legit table, but it's not ascii-encoded? It's... partly ascii?

keflavich commented 3 years ago

this works:

bytes2.replace(b'\xa2',b'').replace(b'\xec',b'').replace(b'\xad',b'').replace(b'\xc0',b'').replace(b'\xf0',b'').decode('ascii')
'ict\x01B\x0bTableRecord\x01\nRecordDesc\x02\x07\tVS_CREATE\x0b\x07VS_DATE\x0b\nVS_VERSION\x0b\x07VS_TYPE\x0b\x0bTAB_VERSION\x0b\x04MJD0\x08\x04dMJD\x08\x01\x102019/12/08/03:00\x102019/12/08/03:00\t0607.0001/IERS Earth Orientation Data predicted from NEOS\t0002.0000@?5\x0bTableRecord\x01\x1a\nRecordDesc\x02\x01\r\x01\x19ScalarColumnDesc<double  \x01\x03MJD\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01d\x01\x01\x19ScalarColumnDesc<double  \x01\x01x\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double  \x01\x02Dx\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double  \x01\x01y\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double  \x01\x02Dy\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double  \x01\x04dUT1\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01s\x01\x01\x19ScalarColumnDesc<double  \x01\x05DdUT1\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01s\x01\x01\x19ScalarColumnDesc<double  \x01\x03LOD\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01s\x01\x01\x19ScalarColumnDesc<double  \x01\x04DLOD\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01s\x01\x01\x19ScalarColumnDesc<double  \x01\x04dPsi\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double  \x01\x05DdPsi\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double  \x01\x04dEps\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double  \x01\x05DdEps\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01'

but, well, "works"

astrofrog commented 3 years ago

ASCII errors are not about the string itself but about the fact we are trying to read a string when the next bytes are in fact not a string. In this case we are trying to read a string when we should be reading another kind of object - in this case TableRecord. To me this indicates that maybe:

--> 447         unknown3 = read_string(f)  # noqa

shouldn't be running for this particular file, so I need to investigate why. Can you try removing that line and seeing if it works? (or I can investigate on Tuesday)

keflavich commented 3 years ago

Same issue - so it's not formatted correctly as a TableRecord?

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-2-854808069a1f> in <module>
----> 1 ct = Table.read('/Applications/CASA.app/Contents/Resources/casa-data/geodetic/IERSpredict', endian='>')

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read(cls, filename, endian)
    349                 raise ValueError('Incorrect magic code: {0}'.format(magic))
    350
--> 351             table = cls.read_fileobj(f)
    352
    353         for dm_index, dm in table.column_set.data_managers.items():

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in wrapper(*args)
     80         b = EndianAwareFileHandle(BytesIO(bytes), f.endian, f.original_filename)
     81         if self:
---> 82             result = func(self, b, *args)
     83         else:
     84             result = func(b, *args)

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read_fileobj(cls, f)
    427         # big_endian = fmt == 0  # noqa
    428
--> 429         self.desc = TableDesc.read(f, self.nrow)
    430
    431         self.column_set = ColumnSet.read(f, desc=self.desc)

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in wrapper(*args)
     80         b = EndianAwareFileHandle(BytesIO(bytes), f.endian, f.original_filename)
     81         if self:
---> 82             result = func(self, b, *args)
     83         else:
     84             result = func(b, *args)

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read(cls, f, nrow)
    448         # unknown3 = read_string(f)  # noqa
    449
--> 450         self.keywords = TableRecord.read(f)
    451         self.private_keywords = TableRecord.read(f)
    452

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in wrapper(*args)
     80         b = EndianAwareFileHandle(BytesIO(bytes), f.endian, f.original_filename)
     81         if self:
---> 82             result = func(self, b, *args)
     83         else:
     84             result = func(b, *args)

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read(cls, f)
    280         self = cls()
    281
--> 282         check_type_and_version(f, 'TableRecord', 1)
    283
    284         desc = RecordDesc.read(f)

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in check_type_and_version(f, name, versions)
     95     if np.isscalar(versions):
     96         versions = [versions]
---> 97     stype, sversion = read_type(f)
     98     if stype != name or sversion not in versions:
     99         raise NotImplementedError('Support for {0} version {1} not implemented'.format(stype, sversion))

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read_type(f)
    216
    217 def read_type(f):
--> 218     tp = read_string(f)
    219     version = read_int32(f)
    220     return tp, version

~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read_string(f)
    142     bytes2 = bytes1.replace(b'\x00', b'')
    143     #bytes2 = bytes2.replace(b'\xa2',b'').replace(b'\xec',b'').replace(b'\xad',b'').replace(b'\xc0',b'').replace(b'\xf0',b'')
--> 144     bytes3 = bytes2.decode('ascii')
    145     return bytes3
    146

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa2 in position 15: ordinal not in range(128)
astrofrog commented 3 years ago

Hmm I think maybe actually there should be something instead of unknown3 - it looks like there is still some unexpected offset in bytes. Basically UnicodeDecodeError always mean that there is not a string at the exact position you are reading.

keflavich commented 3 years ago

if I change that line to unknown3 = read_int32(f) # noqa

I get:

NotImplementedError: Support for  version 0 not implemented

which I think is a much more useful error... so this is tabl eversion 0?

astrofrog commented 3 years ago

No again I think there is an unexpected offset. Instead of the unknown3 line put a _peek(f, 32) call and let me know the output.

keflavich commented 3 years ago

b'\x00\x00\x00\x0bIERS' b'predict\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01B\x00\x00\x00\x0bTableReco'

astrofrog commented 3 years ago

Actually can you put the peek before unknown1 and make it _peek(f, 128)?

keflavich commented 3 years ago
b'Desc\x00\x00\x00\x02'    b'\x00\x00\x00\x0bIERSpredict\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01B\x00\x00\x00\x0bTableRecord\x00\x00\x00\x01\x00\x00\x00\xa2\x00\x00\x00\nRecordDesc\x00\x00\x00\x02\x00\x00\x00\x07\x00\x00\x00\tVS_CREATE\x00\x00\x00\x0b\x00\x00\x00\x00\x00\x00\x00\x07VS_DATE\x00\x00\x00\x0b\x00\x00\x00\x00\x00\x00\x00\nVS_VERSION\x00\x00'
astrofrog commented 3 years ago

Ok so \x00\x00\x00\x0bIERSpredict is a string - (first four bytes are length - 11, followed by the string). Maybe in fact unknown1 should be a read_string, it's just that it's always been an empty string before which is why I could read it as an int32 (which was zero). So try replacing unknown1's call by read_string.

astrofrog commented 3 years ago

Maybe unknown2 should be a string too 😆

keflavich commented 3 years ago

🤯

In [4]: ct.read_as_astropy_table()
Out[4]:
<Table length=181>
          MJD                      x                       Dx                      y            ...          DdPsi                    dEps                   DdEps
        float64                 float64                 float64                 float64         ...         float64                 float64                 float64
----------------------- ----------------------- ----------------------- ----------------------- ... ----------------------- ----------------------- -----------------------
                58735.0                 58735.0                 58735.0                 58735.0 ...                 58735.0                 58735.0                 58735.0
               0.213586                0.213586                0.213586                0.213586 ...                0.213586                0.213586                0.213586
                2.5e-05                 2.5e-05                 2.5e-05                 2.5e-05 ...                 2.5e-05                 2.5e-05                 2.5e-05
               0.338901                0.338901                0.338901                0.338901 ...                0.338901                0.338901                0.338901
                2.5e-05                 2.5e-05                 2.5e-05                 2.5e-05 ...                 2.5e-05                 2.5e-05                 2.5e-05
             -0.1547067              -0.1547067              -0.1547067              -0.1547067 ...              -0.1547067              -0.1547067              -0.1547067
                5.9e-06                 5.9e-06                 5.9e-06                 5.9e-06 ...                 5.9e-06                 5.9e-06                 5.9e-06
-0.00042770000000000004 -0.00042770000000000004 -0.00042770000000000004 -0.00042770000000000004 ... -0.00042770000000000004 -0.00042770000000000004 -0.00042770000000000004
  5.199999999999999e-06   5.199999999999999e-06   5.199999999999999e-06   5.199999999999999e-06 ...   5.199999999999999e-06   5.199999999999999e-06   5.199999999999999e-06
               -117.846                -117.846                -117.846                -117.846 ...                -117.846                -117.846                -117.846
                  0.304                   0.304                   0.304                   0.304 ...                   0.304                   0.304                   0.304
                -11.234                 -11.234                 -11.234                 -11.234 ...                 -11.234                 -11.234                 -11.234
                  0.089                   0.089                   0.089                   0.089 ...                   0.089                   0.089                   0.089
                58736.0                 58736.0                 58736.0                 58736.0 ...                 58736.0                 58736.0                 58736.0
               0.212926                0.212926                0.212926                0.212926 ...                0.212926                0.212926                0.212926
                2.2e-05                 2.2e-05                 2.2e-05                 2.2e-05 ...                 2.2e-05                 2.2e-05                 2.2e-05
                    ...                     ...                     ...                     ... ...                     ...                     ...                     ...
             -0.1516496              -0.1516496              -0.1516496              -0.1516496 ...              -0.1516496              -0.1516496              -0.1516496
                3.9e-06                 3.9e-06                 3.9e-06                 3.9e-06 ...                 3.9e-06                 3.9e-06                 3.9e-06
-0.00045900000000000004 -0.00045900000000000004 -0.00045900000000000004 -0.00045900000000000004 ... -0.00045900000000000004 -0.00045900000000000004 -0.00045900000000000004
                3.5e-06                 3.5e-06                 3.5e-06                 3.5e-06 ...                 3.5e-06                 3.5e-06                 3.5e-06
               -117.184                -117.184                -117.184                -117.184 ...                -117.184                -117.184                -117.184
                -10.593                 -10.593                 -10.593                 -10.593 ...                 -10.593                 -10.593                 -10.593
                58750.0                 58750.0                 58750.0                 58750.0 ...                 58750.0                 58750.0                 58750.0
               0.201695                0.201695                0.201695                0.201695 ...                0.201695                0.201695                0.201695
                2.7e-05                 2.7e-05                 2.7e-05                 2.7e-05 ...                 2.7e-05                 2.7e-05                 2.7e-05
               0.318534                0.318534                0.318534                0.318534 ...                0.318534                0.318534                0.318534
                1.4e-05                 1.4e-05                 1.4e-05                 1.4e-05 ...                 1.4e-05                 1.4e-05                 1.4e-05
               -0.15121                -0.15121                -0.15121                -0.15121 ...                -0.15121                -0.15121                -0.15121
                5.7e-06                 5.7e-06                 5.7e-06                 5.7e-06 ...                 5.7e-06                 5.7e-06                 5.7e-06
-0.00039650000000000004 -0.00039650000000000004 -0.00039650000000000004 -0.00039650000000000004 ... -0.00039650000000000004 -0.00039650000000000004 -0.00039650000000000004
                2.7e-06                 2.7e-06                 2.7e-06                 2.7e-06 ...                 2.7e-06                 2.7e-06                 2.7e-06
               -116.771                -116.771                -116.771                -116.771 ...                -116.771                -116.771                -116.771
keflavich commented 3 years ago

OK so.... I'm.... surprised that worked. I thought these lines:

unknown1 = read_string(f)  # noqa
unknown2 = read_string(f)  # noqa
unknown3 = read_string(f)  # noqa

were just reading a fixed number of bytes to move along the file? Were they reading the wrong number of bytes before?

keflavich commented 3 years ago

I do think this table is malformed - I'm pretty sure rows and columns are swapped or something? looks like the first row is MJD, but the first column is labeled MJD

astrofrog commented 3 years ago

Well it didn't quite work as somehow the data are all jumbled up, right? (seems rows and columns are mixed up somehow)

A string is four bytes plus the length of the string - an empty string is four bytes. So by chance I think read_int32 for unknown1 was working before because for other tables the string was empty.

astrofrog commented 3 years ago

Yeah the malformed table is going to be a bit trickier to debug, I can try and take a look soon.

astrofrog commented 3 years ago

How do you force casa-data to download? (I don't seem to have that folder)

keflavich commented 3 years ago

A string is four bytes plus the length of the string - an empty string is four bytes. So by chance I think read_int32 for unknown1 was working before because for other tables the string was empty.

OK that's what I was thinking.

The unknowns are 'IERSPredict' and two blanks:

>>> print(f"Unknowns.  1: {unknown1}, 2:{unknown2}, 3:{unknown3}")
Unknowns.  1: IERSpredict, 2:, 3:

so I think it is likely right to read those as strings.

keflavich commented 3 years ago

How do you force casa-data to download? (I don't seem to have that folder)

I... don't know. I just had this on my computer from ages ago.

astrofrog commented 3 years ago

Looks like !update-data might do the trick, will report back.

keflavich commented 3 years ago

you might also try using the cngi tools for this: https://colab.research.google.com/github/casangi/casadocs/blob/master/docs/notebooks/external-data.ipynb

astrofrog commented 3 years ago

The table is now at /Applications/CASA.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/casadata/__data__/geodetic/IERSpredict

astrofrog commented 3 years ago

Interesting - the data is included as part of a pip installable casadata package - maybe we can even write some tests against all the tables in there are part of our test suite.

keflavich commented 3 years ago

Yeah, the problem I had with that package was that I couldn't install python-casacore locally

astrofrog commented 3 years ago

Ah this table uses the IncrementalStMan and it's the first time I've seen a table where multiple columns share a common IncrementalStMan - so clearly some tweaks are needed in that situation. I think I know what needs to be fixed.