lsst-uk / lasair-project-management

Event handling site for LSST:UK
Apache License 2.0
9 stars 0 forks source link

DESI Legacy Imaging surveys for Sherlock #296

Open smarttgit opened 1 year ago

smarttgit commented 1 year ago

The southern sky below -30 lacks the modern catalogues from the DECam surveys. This will be a serious problem for Sherlock before the release of LSST DR1. Fortunately it appears that all the DECam surveys have been bundled together in one large release called "Data Release 10 (DR10) - the tenth public data release of the Legacy Surveys."

Description and data links are available here : https://www.legacysurvey.org

smarttgit commented 1 year ago

The catalogues we want are the "sweep catalogues" which are tables containing a subset of the most commonly used measurements. They call these "Tractor measurements", which is the algorithm used to measure source shapes. Reliable star-galaxy separation and galaxy half light radii available.

Data on :

https://www.legacysurvey.org/dr10/files/

See the description under the section Sweep Catalogs (south/sweep/*) https://www.legacysurvey.org/dr10/files/#sweep-catalogs-south-sweep

At the top of the DR10 files page there is a link to :

https://portal.nersc.gov/cfs/cosmo/data/legacysurvey/dr10/

And clicking through leads to : https://portal.nersc.gov/cfs/cosmo/data/legacysurvey/dr10/south/sweep/10.0/

smarttgit commented 1 year ago

The catalogues we want are the "sweep catalogues" which are tables containing a subset of the most commonly used measurements. They call these "Tractor measurements", which is the algorithm used to measure source shapes. Reliable star-galaxy separation and galaxy half light radii available.

Data on :

https://www.legacysurvey.org/dr10/files/

See the description under the section Sweep Catalogs (south/sweep/*) https://www.legacysurvey.org/dr10/files/#sweep-catalogs-south-sweep

At the top of the DR10 files page there is a link to :

https://portal.nersc.gov/cfs/cosmo/data/legacysurvey/dr10/

And clicking through leads to : https://portal.nersc.gov/cfs/cosmo/data/legacysurvey/dr10/south/sweep/10.0/

smarttgit commented 1 year ago

This dir contains a large number of fits tables, between ~20B to a few GB.
https://portal.nersc.gov/cfs/cosmo/data/legacysurvey/dr10/south/sweep/10.0/

Should be able to curl these ?

It has sufficient columns for us, all the photometry, including the unWISE catalogue cross-matched to this one, and a morphological classification of one of :

PSF REX EXP DEV SER DUP

Everything which is not given a type = PSF is a galaxy. There seem equal numbers of PSFs and galaxies, with a total object count of : 2,826,169,461

genghisken commented 1 year ago

We should hopefully be able to use wget to download these. I'll setup a script to bring them down with 20 parallel processes.

genghisken commented 1 year ago

Files are being downloaded in parallel to db6:/nvmeraid/raid0/db6data/catalogues/legacysurvey. Once the download is complete I'll check the SHA256 signatures for each file. Hopefully we won't get blacklisted by NERSC in the meantime.

There are 1,436 files, with a total size of 1.3 TB.

genghisken commented 1 year ago

Download of all FITS files complete! Only took 4 hours 45 mins! Checking the SHA256 sums, but initial check of the first 48 files indicates a complete match.

Now we need to go through the relevant columns and pick out what we need.

https://www.legacysurvey.org/dr10/files/#sweep-catalogs-south-sweep

And of course we should probably convert to magnitudes from nanomaggies on ingest (into new columns).

m = 22.5−2.5log10(flux)

(See https://www.legacysurvey.org/dr10/description/#photometry)

smarttgit commented 1 year ago

Excellent @genghisken ! I will have a look at the columns, and yes we should definitely convert to AB mag from the nonomaggies.

genghisken commented 1 year ago

All but one files pass the SHA256 hash test. I've no idea why one failed, but I can always re-download it. Attempting to open it as a FITS table indicates that the file is indeed corrupt. (Other random files open OK.)

Filename is sweep-335m065-340m060.fits. Recording it here to remind me to re-download it.

smarttgit commented 1 year ago

A lot of useful info in those columns that may be useful in future. I could cut those columns down by say a factor ~2 (maybe 3) to retain all useful info. Question is - is there a need to do that. It's already quite a thin table. Would it be a problem to ingest all, or should we really cut by factor 2-3 ?

As an example, the predicted fibre flux (FIBERFLUX_G for example), is "Predicted 𝑔-band flux within a fiber of diameter 1.5 arcsec from this object in 1 arcsec Gaussian seeing". We don't necessarily need that, but possible this could be very useful for TiDES when we select host galaxies of LSST transients on which to place a fibre. So my thoughts are if we don't need to trim the columns, let's not. But if we do, let me know I can shortlist.

RoyWilliams commented 1 year ago

For Lasair, the Sherlock database is already 5 TB. So the Legacy Survey makes that 6.5 TB. Cutting out half of it, and Sherlock would be 5.75 TB.

Its good to have the option of having 2 or 3 copies of Sherlock to maximise throughput. Can we still do that?

thespacedoctor commented 1 year ago

Typically for Sherlock I have only removed columns if I am certain they are not useful (now or in the future). We now have breathing space on db6, so we are not limited in QUB. If space becomes an issue for Lasair we can always ship a lite version of the database with only core columns included.

genghisken commented 1 year ago

Yes - definitely the full database ingest at QUB (db6) as previously with other surveys - space is not a problem. In fact we need to add a few columns (e.g. converted nanomaggy to magnitude columns, HTM index columns, possibly also a unique ID column - combined RELEASE,BRICKID,OBJID). If we ingest all 2.8 billion rows, this is not going to be less than 1TB. The downloaded FITS files totalled about 1.3 TB and in our experience the database is unlikely to be much smaller. We will need indexes in the database, which adds storage overhead. We can experiment with thin table versions.