radio-astro-tools / casa-formats-io

Code to handle I/O from/to data in CASA format
Other
10 stars 7 forks source link

Table performance issue: too much info being loaded with __repr__? #36

Open keflavich opened 2 years ago

keflavich commented 2 years ago

I noticed, and I think @miguelcarcamov also noticed, that printing the __repr__ of a Table produced with the standard

import casa_formats_io
from astropy.table import Table
tbl = Table.read('my.ms')
tbl # get the __repr__

can be very slow, while getting the __repr__ of any individual row or column is fast:

tbl[0] # fast
tbl['DATA'] # fast

It looks like there's a bottleneck somewhere in astropy's table formatter - is it perhaps trying to load all the data when making the repr?

astrofrog commented 2 years ago

It is because it accesses each cell individually which is inefficient dask wise - I am going to investigate ways to speed this up!

miguelcarcamov commented 2 years ago

@astrofrog please let us know once you have found a way to speed this up! Cheers

keflavich commented 2 years ago

Before #38:

%timeit tables = [Table.read('HD163296_continuum.ms', data_desc_id=ii) for ii in desc['SPECTRAL_WINDOW_ID']]
5min 26s ± 2.22 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit tables = rslt.as_astropy_table(all_ddids=True)
30.4 s ± 222 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
keflavich commented 2 years ago

OK, the __repr__ is much faster now, so :+1:. The performance numbers I gave are for reading, and they are not improved - but they shouldn't be from this change.