ratt-ru / packratt

BSD 3-Clause "New" or "Revised" License
0 stars 2 forks source link

Data Product Schema #1

Open sjperkins opened 4 years ago

sjperkins commented 4 years ago

packratt is both an application and python package for downloading and caching radio astronomy data products, primarily to facilitate testing radio astronomy software.

I propose using the following schemas to uniquely identify data products for download

packratt get /ms/<telescope>/<observation_date>/<filename> <target_dir>
packratt get /uvfits/<telescope>/<observation_date>/<filename> <target_dir>
packratt get /beams/<telescope>/<filename> <target_dir>
packratt get /gains/<telescope>/<observation_date>/<filename> <target_dir>

in the python layer:

import packratt
packratt.get("/ms/<telescope>/<observation_date>/<filename>", target_dir)
packratt.get("/uvfits/<telescope>/<observation_date>/<filename>", target_dir)
packratt.get("/beams/<telescope>/<filename", target_dir)
packratt.get("/gains/<telescope>/<observation_date>/<filename>", target_dir)

The above schemas create keys which uniquely identify a product to download and cache, defined in a yaml registry:

"/ms/<telescope>/<observation_date>/<filename>":
    "type": "url"
    "url": ftp://elwood.ru.ac.za/pub/astronomer/observation.tar.gz
    "hash": "1234567890abcdef"

"/ms/<telescope>/<observation_date>/<filename>":
    "type": "google"
    "file_id": "1234567890"
    "hash": "1234567890abcdef"

/cc @o-smirnov @smasoka @SpheMakh @svw26 @bennahugo @jskenyon @Athanaseus @landmanbester @gigjozsa @IanHeywood @mulan-94

svw26 commented 4 years ago

Sorry, please can you clarify what you mean by "data artefact"? e.g. Many radio images have imaging artefacts, and an individual image may have many of these artefacts (including different types of imaging artefacts!) x

sjperkins commented 4 years ago

Sorry, please can you clarify what you mean by "data artefact"? e.g. Many radio images have imaging artefacts, and an individual image may have many of these artefacts (including different types of imaging artefacts!) x

Ah, I was using the term in Computer Science context. See End User Development for example. I'll modify the terminology to use the term "Data Product" instead.

o-smirnov commented 4 years ago

Anything large and in need of a download. A test MS for a package (hello @gigjozsa). A bunch of FITS images. A set of Singularity images. Etc.

sjperkins commented 4 years ago

/cc @rubyvanrooyen who may also have insight on the matter!

svw26 commented 4 years ago

OK, thank you :) x