Closed Lukas113 closed 7 months ago
Yeah, totally, this is awesome. And it looks like an great catalog, covering more! Might be useful for MMODAS?
If the license allows it (couldn't find it with a quick glance), redistributing it from CSCS object store would be nice. .fits.gz
is probably even smaller, and supported by most FITS libraries.
Can you make sure to reference the authors/source of the dataset in the docstring?
Creating a .fits.gz
does compress the .fits file from about 368MB to ~200MB (gzip=206MB, bzip2=196MB).
Yeah referencing it in the docstring is what also what came to my mind. Similar as GLEAM and MIGHTEE docstring references. The only sentences I've found for referencing MALS are:
I've looked into the survey a little more. It is important to know that this is not a wide-field survey like GLEAM which was created by some drift-scans. The MALS survey is created by source-tracking of 391 different sources with a specified radius of the sky. The coverage is looking as follows:
Therefore, I think we should provide an additonal function which is able to get these positions as RA/DEC (MALS format of pointing_name
is: JHHMMSS.ss ± DDMMSS.s
, which is the only info to get the phase-center)
Therefore, I've created an additional function to easily get the RA/DEC in degrees from these strings. However, they have to get extracted from the .fits file accordingly.
def get_pos_ids_to_ra_dec(
pos_ids: Iterable[str] | str,
) -> dict[str, tuple[float, float]]:
"""Converts position-id(s) from str to ra-dec in degrees.
The supported format(s) of `pos_ids` is 'JHHMMSS(.s+)±DDMMSS(.s+)' (J2000) ,
where each pos-id has to follow the same format. This means e.g. if the first
pos-id has the format 'JHHMMSS.ss±DDMMSS.s', then having other pos-ids
whth another format like 'JHHMMSS.s±DDMMSS' is not allowed nor checked
in this function.
A valid example of `pos_ids` (here just str instead of an iterable) is
`pos_ids`="J130508.50-285042.0", which results in RA≈196.285, DEC≈-28.845
Args:
pos_ids: Position-id(s), which all must follow the same supported format
for positional-parsing and unit-encoding.
Returns:
Dict from pos-id to ra-dec-tuple.
"""
UnitMatchType = dict[str, list[int]]
if isinstance(pos_ids, str):
pos_ids = [pos_ids]
def get_unit_matches_idxs(formatted_pos: str) -> UnitMatchType:
"""Gets indices of `formatted_pos` units.
Supports the following formats:
- ra-format='JHHMMSS(.s+)'
- dec-format='±DDMMSS(.s+)'
Args:
formatted_pos: Splitted position-id for positional
Returns:
Unit-char to positions as dict[str, list[int]].
"""
unit_idxs: UnitMatchType = dict()
if (first_char := formatted_pos[0]) == "J":
ra = True
format_ = "JHHMMSS" # positional encoded units
units = ("h", "m", "s") # used units in `format_`
elif first_char == "+" or first_char == "-":
ra = False
format_ = "±DDMMSS"
units = ("d", "m", "s") # order according to `format_` and `Angle`
else:
raise ValueError(
f"First value of {formatted_pos} is expected to be 'J', '+' or '-'"
)
formatted_pos_splitted = formatted_pos.split(".")
if len(formatted_pos_splitted) > 2:
raise ValueError(
f"{formatted_pos=} contains multiple dot-separators "
+ "which is not supported"
)
if len(formatted_pos_splitted[0]) != len(format_):
raise ValueError(
f"{formatted_pos=} doesn't follow format: {format_ + '(.s+)'}"
)
format_list = np.array(list(format_.lower()))
for unit_char in units:
match_idxs = tuple(np.where(format_list == unit_char)[0])
if (
unit_char == format_list[-1] and len(formatted_pos_splitted) == 2
): # handle (.s+)
point_idx = int(np.where(np.array(list(formatted_pos)) == ".")[0][0])
extension_idxs = tuple(range(point_idx, len(formatted_pos)))
match_idxs = match_idxs + extension_idxs
if len(match_idxs) > 0:
unit_idxs[unit_char] = match_idxs
return unit_idxs
def get_sign(pos_id: str) -> str:
"""Extracts the pos-id separator.
Allowed separators are '+' and '-'.
Args:
pos_id: Position-id to extract sign-separator from.
Returns:
Sign separator.
"""
if "+" in pos_id:
sign = "+"
elif "-" in pos_id:
sign = "-"
else:
raise ValueError(
f"{pos_id} doesn't contain a valid sign-separator '+' or '-'"
)
return sign
# just check first pos-id to not have to call `get_unit_matches_idxs` for each id
sample_pos_id = pos_ids[0]
sample_sign = get_sign(sample_pos_id)
sample_ra_str, sample_dec_str = sample_pos_id.split(sep=sample_sign, maxsplit=1)
sample_dec_str = f"{sample_sign}{sample_dec_str}"
ra_unit_matches = get_unit_matches_idxs(formatted_pos=sample_ra_str)
dec_unit_matches = get_unit_matches_idxs(formatted_pos=sample_dec_str)
def create_angle_str(formatted_pos: str, unit_matches: UnitMatchType) -> str:
"""Creates `astropy.coordinates.Angle` compatible string.
Args:
formatted_pos: RA or DEC string from a single pos-id.
unit_matches: Unit positional indices. order must follow
(h, m, s) or (d, m, s).
Returns:
Angle string.
"""
angle_str = ""
if (first_char := formatted_pos[0]) == "+" or first_char == "-":
angle_str = first_char
for unit_char, unit_idxs in unit_matches.items():
angle_str += (
f"{formatted_pos[unit_idxs[0]:unit_idxs[-1]+1]}{unit_char}"
)
return angle_str
def convert_pos_id_to_ra_dec(pos_id: str) -> tuple[float, float]:
"""Converts a single pos-id to ra-dec tuple in degrees.
Assumes that unit-matches for each unit are all consecutive.
Args:
pos_id: Positional-id to convert.
Returns:
RA-DEC tuple in deg.
"""
sign = get_sign(pos_id=pos_id)
ra_pos, dec_pos = pos_id.split(sign, maxsplit=1)
dec_pos = f"{sign}{dec_pos}" # keeps sign in dec-str
ra_angle_str = create_angle_str(
formatted_pos=ra_pos, unit_matches=ra_unit_matches
)
dec_angle_str = create_angle_str(
formatted_pos=dec_pos, unit_matches=dec_unit_matches
)
ra = Angle(ra_angle_str).to(u.deg).value
dec = Angle(dec_angle_str).to(u.deg).value
return ra, dec
ra_dec: dict[str, tuple[float, float]] = dict()
for pos_id in pos_ids:
ra_dec[pos_id] = convert_pos_id_to_ra_dec(
pos_id=pos_id,
)
return ra_dec
df = pd.read_csv("catalogue_malsdr1v3_all.csv", usecols=["pointing_name"])
pos_ids = df["pointing_name"].unique() # should come from .fits, not from .csv file
sky_coords = get_pos_ids_to_ra_dec(pos_ids=pos_ids)
Yesterday I got an e-mail that a new MALS-survey from MeerKAT got released. Since MeerKAT is a precursor of SKA. I was wondering whether we should provide the survey through Karabo. It has a catalogue available for more than 700k unique sources, created using PyBDSF. @lmachadopolettivalle and @sfiruch what do you think about this?
Website: https://mals.iucaa.in/
CSV-col-desc: https://iopscience.iop.org/article/10.3847/1538-4365/acf7b9#apjsacf7b9t3
There are some inconveniences to provide the survey as a Karabo-Sky. One of them is that the full catalogue is just available as .csv and is larger than 3.6GB, which currently takes more than an hour to download from their server. However, I was able to download the .csv and convert it to a .fits file, where the remaining data is saved more efficiently and filtered with the relevant SkyModel.sources data. The code looks roughly as follows, where some parts obviously need some adaptions: