Closed keflavich closed 3 years ago
this works:
bytes2.replace(b'\xa2',b'').replace(b'\xec',b'').replace(b'\xad',b'').replace(b'\xc0',b'').replace(b'\xf0',b'').decode('ascii')
'ict\x01B\x0bTableRecord\x01\nRecordDesc\x02\x07\tVS_CREATE\x0b\x07VS_DATE\x0b\nVS_VERSION\x0b\x07VS_TYPE\x0b\x0bTAB_VERSION\x0b\x04MJD0\x08\x04dMJD\x08\x01\x102019/12/08/03:00\x102019/12/08/03:00\t0607.0001/IERS Earth Orientation Data predicted from NEOS\t0002.0000@?5\x0bTableRecord\x01\x1a\nRecordDesc\x02\x01\r\x01\x19ScalarColumnDesc<double \x01\x03MJD\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01d\x01\x01\x19ScalarColumnDesc<double \x01\x01x\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double \x01\x02Dx\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double \x01\x01y\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double \x01\x02Dy\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double \x01\x04dUT1\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01s\x01\x01\x19ScalarColumnDesc<double \x01\x05DdUT1\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01s\x01\x01\x19ScalarColumnDesc<double \x01\x03LOD\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01s\x01\x01\x19ScalarColumnDesc<double \x01\x04DLOD\x10IncrementalStMan\x10IncrementalStMan\x08J\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x01s\x01\x01\x19ScalarColumnDesc<double \x01\x04dPsi\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double \x01\x05DdPsi\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double \x01\x04dEps\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01\x01\x19ScalarColumnDesc<double \x01\x05DdEps\x10IncrementalStMan\x10IncrementalStMan\x08O\x0bTableRecord\x01*\nRecordDesc\x02\x01\x04UNIT\x0b\x01\x06arcsec\x01'
but, well, "works"
ASCII errors are not about the string itself but about the fact we are trying to read a string when the next bytes are in fact not a string. In this case we are trying to read a string when we should be reading another kind of object - in this case TableRecord. To me this indicates that maybe:
--> 447 unknown3 = read_string(f) # noqa
shouldn't be running for this particular file, so I need to investigate why. Can you try removing that line and seeing if it works? (or I can investigate on Tuesday)
Same issue - so it's not formatted correctly as a TableRecord?
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-2-854808069a1f> in <module>
----> 1 ct = Table.read('/Applications/CASA.app/Contents/Resources/casa-data/geodetic/IERSpredict', endian='>')
~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read(cls, filename, endian)
349 raise ValueError('Incorrect magic code: {0}'.format(magic))
350
--> 351 table = cls.read_fileobj(f)
352
353 for dm_index, dm in table.column_set.data_managers.items():
~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in wrapper(*args)
80 b = EndianAwareFileHandle(BytesIO(bytes), f.endian, f.original_filename)
81 if self:
---> 82 result = func(self, b, *args)
83 else:
84 result = func(b, *args)
~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read_fileobj(cls, f)
427 # big_endian = fmt == 0 # noqa
428
--> 429 self.desc = TableDesc.read(f, self.nrow)
430
431 self.column_set = ColumnSet.read(f, desc=self.desc)
~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in wrapper(*args)
80 b = EndianAwareFileHandle(BytesIO(bytes), f.endian, f.original_filename)
81 if self:
---> 82 result = func(self, b, *args)
83 else:
84 result = func(b, *args)
~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read(cls, f, nrow)
448 # unknown3 = read_string(f) # noqa
449
--> 450 self.keywords = TableRecord.read(f)
451 self.private_keywords = TableRecord.read(f)
452
~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in wrapper(*args)
80 b = EndianAwareFileHandle(BytesIO(bytes), f.endian, f.original_filename)
81 if self:
---> 82 result = func(self, b, *args)
83 else:
84 result = func(b, *args)
~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read(cls, f)
280 self = cls()
281
--> 282 check_type_and_version(f, 'TableRecord', 1)
283
284 desc = RecordDesc.read(f)
~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in check_type_and_version(f, name, versions)
95 if np.isscalar(versions):
96 versions = [versions]
---> 97 stype, sversion = read_type(f)
98 if stype != name or sversion not in versions:
99 raise NotImplementedError('Support for {0} version {1} not implemented'.format(stype, sversion))
~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read_type(f)
216
217 def read_type(f):
--> 218 tp = read_string(f)
219 version = read_int32(f)
220 return tp, version
~/repos/casa-formats-io/casa_formats_io/casa_low_level_io.py in read_string(f)
142 bytes2 = bytes1.replace(b'\x00', b'')
143 #bytes2 = bytes2.replace(b'\xa2',b'').replace(b'\xec',b'').replace(b'\xad',b'').replace(b'\xc0',b'').replace(b'\xf0',b'')
--> 144 bytes3 = bytes2.decode('ascii')
145 return bytes3
146
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa2 in position 15: ordinal not in range(128)
Hmm I think maybe actually there should be something instead of unknown3 - it looks like there is still some unexpected offset in bytes. Basically UnicodeDecodeError
always mean that there is not a string at the exact position you are reading.
if I change that line to
unknown3 = read_int32(f) # noqa
I get:
NotImplementedError: Support for version 0 not implemented
which I think is a much more useful error... so this is tabl eversion 0?
No again I think there is an unexpected offset. Instead of the unknown3 line put a _peek(f, 32) call and let me know the output.
b'\x00\x00\x00\x0bIERS' b'predict\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01B\x00\x00\x00\x0bTableReco'
Actually can you put the peek before unknown1 and make it _peek(f, 128)?
b'Desc\x00\x00\x00\x02' b'\x00\x00\x00\x0bIERSpredict\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01B\x00\x00\x00\x0bTableRecord\x00\x00\x00\x01\x00\x00\x00\xa2\x00\x00\x00\nRecordDesc\x00\x00\x00\x02\x00\x00\x00\x07\x00\x00\x00\tVS_CREATE\x00\x00\x00\x0b\x00\x00\x00\x00\x00\x00\x00\x07VS_DATE\x00\x00\x00\x0b\x00\x00\x00\x00\x00\x00\x00\nVS_VERSION\x00\x00'
Ok so \x00\x00\x00\x0bIERSpredict is a string - (first four bytes are length - 11, followed by the string). Maybe in fact unknown1 should be a read_string, it's just that it's always been an empty string before which is why I could read it as an int32 (which was zero). So try replacing unknown1's call by read_string.
Maybe unknown2 should be a string too 😆
🤯
In [4]: ct.read_as_astropy_table()
Out[4]:
<Table length=181>
MJD x Dx y ... DdPsi dEps DdEps
float64 float64 float64 float64 ... float64 float64 float64
----------------------- ----------------------- ----------------------- ----------------------- ... ----------------------- ----------------------- -----------------------
58735.0 58735.0 58735.0 58735.0 ... 58735.0 58735.0 58735.0
0.213586 0.213586 0.213586 0.213586 ... 0.213586 0.213586 0.213586
2.5e-05 2.5e-05 2.5e-05 2.5e-05 ... 2.5e-05 2.5e-05 2.5e-05
0.338901 0.338901 0.338901 0.338901 ... 0.338901 0.338901 0.338901
2.5e-05 2.5e-05 2.5e-05 2.5e-05 ... 2.5e-05 2.5e-05 2.5e-05
-0.1547067 -0.1547067 -0.1547067 -0.1547067 ... -0.1547067 -0.1547067 -0.1547067
5.9e-06 5.9e-06 5.9e-06 5.9e-06 ... 5.9e-06 5.9e-06 5.9e-06
-0.00042770000000000004 -0.00042770000000000004 -0.00042770000000000004 -0.00042770000000000004 ... -0.00042770000000000004 -0.00042770000000000004 -0.00042770000000000004
5.199999999999999e-06 5.199999999999999e-06 5.199999999999999e-06 5.199999999999999e-06 ... 5.199999999999999e-06 5.199999999999999e-06 5.199999999999999e-06
-117.846 -117.846 -117.846 -117.846 ... -117.846 -117.846 -117.846
0.304 0.304 0.304 0.304 ... 0.304 0.304 0.304
-11.234 -11.234 -11.234 -11.234 ... -11.234 -11.234 -11.234
0.089 0.089 0.089 0.089 ... 0.089 0.089 0.089
58736.0 58736.0 58736.0 58736.0 ... 58736.0 58736.0 58736.0
0.212926 0.212926 0.212926 0.212926 ... 0.212926 0.212926 0.212926
2.2e-05 2.2e-05 2.2e-05 2.2e-05 ... 2.2e-05 2.2e-05 2.2e-05
... ... ... ... ... ... ... ...
-0.1516496 -0.1516496 -0.1516496 -0.1516496 ... -0.1516496 -0.1516496 -0.1516496
3.9e-06 3.9e-06 3.9e-06 3.9e-06 ... 3.9e-06 3.9e-06 3.9e-06
-0.00045900000000000004 -0.00045900000000000004 -0.00045900000000000004 -0.00045900000000000004 ... -0.00045900000000000004 -0.00045900000000000004 -0.00045900000000000004
3.5e-06 3.5e-06 3.5e-06 3.5e-06 ... 3.5e-06 3.5e-06 3.5e-06
-117.184 -117.184 -117.184 -117.184 ... -117.184 -117.184 -117.184
-10.593 -10.593 -10.593 -10.593 ... -10.593 -10.593 -10.593
58750.0 58750.0 58750.0 58750.0 ... 58750.0 58750.0 58750.0
0.201695 0.201695 0.201695 0.201695 ... 0.201695 0.201695 0.201695
2.7e-05 2.7e-05 2.7e-05 2.7e-05 ... 2.7e-05 2.7e-05 2.7e-05
0.318534 0.318534 0.318534 0.318534 ... 0.318534 0.318534 0.318534
1.4e-05 1.4e-05 1.4e-05 1.4e-05 ... 1.4e-05 1.4e-05 1.4e-05
-0.15121 -0.15121 -0.15121 -0.15121 ... -0.15121 -0.15121 -0.15121
5.7e-06 5.7e-06 5.7e-06 5.7e-06 ... 5.7e-06 5.7e-06 5.7e-06
-0.00039650000000000004 -0.00039650000000000004 -0.00039650000000000004 -0.00039650000000000004 ... -0.00039650000000000004 -0.00039650000000000004 -0.00039650000000000004
2.7e-06 2.7e-06 2.7e-06 2.7e-06 ... 2.7e-06 2.7e-06 2.7e-06
-116.771 -116.771 -116.771 -116.771 ... -116.771 -116.771 -116.771
OK so.... I'm.... surprised that worked. I thought these lines:
unknown1 = read_string(f) # noqa
unknown2 = read_string(f) # noqa
unknown3 = read_string(f) # noqa
were just reading a fixed number of bytes to move along the file? Were they reading the wrong number of bytes before?
I do think this table is malformed - I'm pretty sure rows and columns are swapped or something? looks like the first row is MJD, but the first column is labeled MJD
Well it didn't quite work as somehow the data are all jumbled up, right? (seems rows and columns are mixed up somehow)
A string is four bytes plus the length of the string - an empty string is four bytes. So by chance I think read_int32 for unknown1 was working before because for other tables the string was empty.
Yeah the malformed table is going to be a bit trickier to debug, I can try and take a look soon.
How do you force casa-data to download? (I don't seem to have that folder)
A string is four bytes plus the length of the string - an empty string is four bytes. So by chance I think read_int32 for unknown1 was working before because for other tables the string was empty.
OK that's what I was thinking.
The unknowns are 'IERSPredict' and two blanks:
>>> print(f"Unknowns. 1: {unknown1}, 2:{unknown2}, 3:{unknown3}")
Unknowns. 1: IERSpredict, 2:, 3:
so I think it is likely right to read those as strings.
How do you force casa-data to download? (I don't seem to have that folder)
I... don't know. I just had this on my computer from ages ago.
Looks like !update-data
might do the trick, will report back.
you might also try using the cngi tools for this: https://colab.research.google.com/github/casangi/casadocs/blob/master/docs/notebooks/external-data.ipynb
The table is now at /Applications/CASA.app/Contents/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/casadata/__data__/geodetic/IERSpredict
Interesting - the data is included as part of a pip installable casadata package - maybe we can even write some tests against all the tables in there are part of our test suite.
Yeah, the problem I had with that package was that I couldn't install python-casacore
locally
Ah this table uses the IncrementalStMan and it's the first time I've seen a table where multiple columns share a common IncrementalStMan - so clearly some tweaks are needed in that situation. I think I know what needs to be fixed.
CASA comes with IERS tables, e.g.
/Applications/CASA.app/Contents/Resources/casa-data/geodetic/IERSpredict'
Trying to read them with #12 results in:
(note that I expanded out
read_string
into multiple steps).Digging into the debugger, I get:
so this looks like a legit table, but it's not ascii-encoded? It's... partly ascii?