Closed thespacedoctor closed 4 years ago
This page mentions a 'basic s/g separation parameter' to be found in the StackObjectAttributes table:
https://outerspace.stsci.edu/display/PANSTARRS/PS1+Source+extraction+and+catalogs
The columns this comment relates to are the XExtNSigma columns (X = grizy). From the definition:
An extendedness measure for the g filter stack detection based on the deviation between PSF and Kron (1980) magnitudes, normalized by the PSF magnitude uncertainty.
it seems this metric is to be used with NMs first recipe here. Not sure how normalisation with the PSF mag error works, but I think we can assume anything < 0 is a star and > 0 is a galaxy.
It's unclear from ZTF scheme if this is what they're using - I think not (unless there's further normalisation going on):
sgscore1: Star/Galaxy score of closest source from PS1 catalog; if exists within 30 arcsec: 0 <= sgscore <= 1 where closer to 1 implies higher likelihood of being a star
request sent to A.T.
Unfortunately AT can't help us - he had no access to PS-DR1 catalogues. I suggest we either put a request in with Armin Rest or Edinburgh. Most likely Edinburgh will be quicker to respond.
Just reread the ZFT atel:
we employ a machine-learning star-galaxy separator, based on PS1 data
Download of PanSTARR DR1 started from MAST -- let's see how long it takes to get banned!
http://astronotes.co.uk/blog/2018/06/20/downloading-panstarrs-dr1-catalogue-data.html
2,345,500,000 rows downloaded so far. I'm going to start getting these rows into the sherlock database. Do we have enough space on psdb3 Ken?
The simple answer is "no"! However, this is very timely. Robert has setup a new pair of machines (db0 and db1) which have a 16TB RAID0 SSD (NVME) installed. I'm currently installing MySQL on these machines, and when done we should move the crossmatch_catalogues database to one of them. Note that RAID0 means that if any of the SSDs fail, we lose the database entirely. But if I get replication and backups running, this shouldn't be too large a risk.
The PS1 DR1 catalogue with Tachibana & Miller point-source scores are now loaded into the Sherlock database in one large table. We now need to make some decisions on how to build the catalogue into our search algorithm. I'm going to post some notes (please correct me if you think I've misunderstood any of the machine learning jargon) and then pose some questions.
Note unresolved is taken to mean point-like sources (i.e. not a cloud or disc). Asteroids, QSOs, stars and distant/small galaxies will unresolved. Resolved objects will include most nearby/larger galaxies.
Training Set
The model was trained using \~50,000 HST COSMOS morphology classifications and the performance tested against SDSS spectra and Gaia sources (with high-confidence stellar identifications).
Features
Photometry measurements for the model are those taken from the PS1 StackObjectThin and StackObjectAttributes tables; i.e. as measured from the PS1 stacked images and not individual single exposures. The shape measurements from the StackObjectAttributes table are used to identify unresolved sources in the PS1 DR1 catalogue.
The PS1 DR1 catalogues provided 3 measurements of flux in 5 filters:
To combat the issue of missing data, a set of 'white flux' features are added to the PS1 metrics that involve combining flux measurements across all filters in which a PS1 source is detected (often not in all 5 filters).
Very Basic Star-Galaxy Separation
The PS1 documentation states that for sources with $i < 21$ mag those sources with
can be considered as galaxies. The obvious difference between stellar and galaxy populations in the bright regime is clearly presented in Figure 3 of the paper:
Figure of Merit
The Figure of Merit (FoM) is defined in this model as the True Positive Rate (TPR) that corresponds to a False Positive Rate (FPR) of 0.005.
From Figure 4 in the paper we see that a FPR threshold of 0.005 gives a FoM TPR of \~0.71. So we will be able to flag and remove 71% of the stellar 'contaminants' at the expense of 0.5% of those sources removed actually being galaxies. We of-course are free to define our own FoM value to suit our needs.
Table 3 contains the information we need to make our decision on cuts:
Figure 7 reveals the accuracy of the model for individual stellar and extended sources. As can be seen the accuracy remains decent even down in the faintest regime. Here accuracy is a measure of the stellar/galaxy separation if it is assumed galaxies have ps1_psc
< 0.5 and stellar sources ps1_psc
> 0.5.
Note most of the ambiguous sources (0.2 < ps1_psc
< 0.8) are to be found in the galactic plane where blended stars are hard to define.
Non-Identified Sources
About half of the sources in the entire 2.9 billion row PS1 DR1 table do not have a point-source score. This is due to the way Tachibana & Miller select the sample to run their classifier against. Not much info is given in the paper about the selection but I think the answer is in this notebook:
https://github.com/adamamiller/PS1_star_galaxy/blob/master/PS1casjobs/PS1features.query
A simplified version of the query just showing the cuts could be written:
SELECT
COUNT(*)
FROM
StackObjectView
WHERE
primaryDetection = 1 AND nDetections > 2
GROUP BY objid
HAVING COUNT(objid) = 1;
Running this query on MAST gets me very close to the \~1.5 billion source count found in the Tachibana & Miller sample.
From my understanding, the vast majority of the sources that do not make the cut are sources detected in the stacked images that are not detected in the individual warp images.
Questions/Discussion
DRY answers
Views created on PS1 catalogue for stars, galaxies and unknown:
-- PS1 Stars
CREATE VIEW `tcs_view_star_ps1_dr1` AS
SELECT
*
FROM
tcs_cat_ps1_dr1
WHERE
ps_score >= 0.83 AND ps_score IS NOT NULL;
-- PS1 Galaxies
CREATE VIEW `tcs_view_galaxy_ps1_dr1` AS
SELECT
*
FROM
tcs_cat_ps1_dr1
WHERE
ps_score < 0.83 AND ps_score IS NOT NULL;
-- PS1 Unclear
CREATE VIEW `tcs_view_unclear_ps1_dr1` AS
SELECT
*
FROM
tcs_cat_ps1_dr1
WHERE
ps_score IS NULL;
Great. I'll need to remember to apply these in Edinburgh. I'll add a github action for myself in Lasair. How did you arrive at the 0.83 magic number? Is this the 0.5 FPR?
Yes. Table 3, first row under the 0.005 column. I think I remember this as one of the numbers Frank Masci suggested to use in the cuts when siphoning off the best transients from the ZTF stream for the brokers.
Great. Very unlikely to happen with a float, and have an exact score of 0.830000, but we should make one of those inequalities >= or <=.
very good point. Updated star view, see above ↑ (and in database)
Also - I like the idea of tagging objects as UNCLEAR for the no-score PS1 objects. At least it removes the "ORPHAN" classification and indicates that there is something there - even if we can't do much with it. What's the disadvantage?
Read the Tachibana & Miller paper again. Answers to Dave's questions :
Yes, agree. 0.83 is probably optimal
No mag dependent variation, just a straight 0.83 cut
The 50% that have no RF score. I guess these are mostly (or exclusively) at the faint end ? Are they roughly >21 ? Then I would use the offset to decide further. If the transient is offset, then likely this is a galaxy and hence the transient is more likely a SN :
if Object_coords < 1.5" from a PSObject which has r >21 or i >21 then classify as UNCLEAR and report offset, magnitude of the PSObject (and name if possible)
elseif 1.5" < Object_coords < 3.0" from a PSObject with r > 21 or i > 21 then classify as SN and report offset, magnitude of the PSObject (and name if possible)
I have the algorithm setup as above now. Running some test.
ATLAS20his originally an ORPHAN is now a SN! Congratulations ATLAS20his!!
Transient's Predicted Classification: SN
Suggested Associations:
+-------------------+-------+------------+-----------------------+----------------------+------------------------+---------------------------+-------------+--------------+-------------------+--------------------------+------------------+-----------+-------+---------+------------+--------+------------+---------+-----------------------------+--------------+
| association type | rank | rankScore | catalogue table name | catalogue object id | catalogue object type | catalogue object subtype | raDeg | decDeg | separationArcsec | physical separation kpc | direct distance | distance | z | photoZ | photoZErr | Mag | MagFilter | MagErr | classification reliability | merged rank |
+-------------------+-------+------------+-----------------------+----------------------+------------------------+---------------------------+-------------+--------------+-------------------+--------------------------+------------------+-----------+-------+---------+------------+--------+------------+---------+-----------------------------+--------------+
| SN | 1 | 2005.00 | PS1 | 94280837680782324 | galaxy | multiple | 83.7681061 | -11.4316385 | 0.15 | | | | | | | 22.56 | r | 0.01 | association | |
| SN | | 2005.00 | PanSTARRS DR1 | 94280837680782324 | galaxy | | 83.7681061 | -11.4316385 | 0.15 | | | | | | | 22.56 | r | 0.01 | association | 1 |
+-------------------+-------+------------+-----------------------+----------------------+------------------------+---------------------------+-------------+--------------+-------------------+--------------------------+------------------+-----------+-------+---------+------------+--------+------------+---------+-----------------------------+--------------+
Same for ZTF20aasoaeu:
Transient's Predicted Classification: SN
Suggested Associations:
+-------------------+-------+------------+-----------------------+----------------------+------------------------+---------------------------+-------------+--------------+-------------------+--------------------------+------------------+-----------+-------+---------+------------+--------+------------+---------+-----------------------------+--------------+
| association type | rank | rankScore | catalogue table name | catalogue object id | catalogue object type | catalogue object subtype | raDeg | decDeg | separationArcsec | physical separation kpc | direct distance | distance | z | photoZ | photoZErr | Mag | MagFilter | MagErr | classification reliability | merged rank |
+-------------------+-------+------------+-----------------------+----------------------+------------------------+---------------------------+-------------+--------------+-------------------+--------------------------+------------------+-----------+-------+---------+------------+--------+------------+---------+-----------------------------+--------------+
| SN | 1 | 2005.00 | PS1 | 87331876323496707 | galaxy | multiple | 187.632388 | -17.2197069 | 1.85 | | | | | | | 20.94 | r | 0.01 | association | |
| SN | | 2005.00 | PanSTARRS DR1 | 87331876323496707 | galaxy | | 187.632388 | -17.2197069 | 1.85 | | | | | | | 20.94 | r | 0.01 | association | 1 |
+-------------------+-------+------------+-----------------------+----------------------+------------------------+---------------------------+-------------+--------------+-------------------+--------------------------+------------------+-----------+-------+---------+------------+--------+------------+---------+-----------------------------+--------------+
ZTF20aasikyz moves from ORPHAN > UNLCEAR
Transient's Predicted Classification: UNCLEAR
Suggested Associations:
+-------------------+-------+------------+-----------------------+----------------------+------------------------+---------------------------+-------------+--------------+-------------------+--------------------------+------------------+-----------+-------+---------+------------+--------+------------+---------+-----------------------------+--------------+
| association type | rank | rankScore | catalogue table name | catalogue object id | catalogue object type | catalogue object subtype | raDeg | decDeg | separationArcsec | physical separation kpc | direct distance | distance | z | photoZ | photoZErr | Mag | MagFilter | MagErr | classification reliability | merged rank |
+-------------------+-------+------------+-----------------------+----------------------+------------------------+---------------------------+-------------+--------------+-------------------+--------------------------+------------------+-----------+-------+---------+------------+--------+------------+---------+-----------------------------+--------------+
| UNCLEAR | 1 | 1010.55 | PS1 | 92372318529638110 | uncertain | multiple | 231.852898 | -13.0185416 | 0.55 | | | | | | | 22.46 | r | 0.01 | synonym | |
| UNCLEAR | | 1010.55 | PanSTARRS DR1 | 92372318529638110 | uncertain | | 231.852898 | -13.0185416 | 0.55 | | | | | | | 22.46 | r | 0.01 | synonym | 1 |
+-------------------+-------+------------+-----------------------+----------------------+------------------------+---------------------------+-------------+--------------+-------------------+--------------------------+------------------+-----------+-------+---------+------------+--------+------------+---------+-----------------------------+--------------+
I've performed a lot of tests and adjusted the search algorithm slightly here and there. I'm now happy with the results I'm getting. I'll send @genghisken the new algorithm.
I've performed a lot of tests and adjusted the search algorithm slightly here and there. I'm now happy with the results I'm getting. I'll send @genghisken the new algorithm.
go for it !
For ATLAS and Pan-STARRS I'll switch the algorithms late tonight (2020-03-09) or tomorrow morning.
The catalogues are copied to Lasair (live - lasair-node0) but not yet to Lasair-dev (lasair-dev-node0). Even on Lasair (live) I need to move the PS1 catalogue in place and run the views (Gaia DR2 is already in place). The catalogues are physically in Edinburgh so it should be easy to copy them from lasair-node0 to lasair-dev-node0. Writing all this down so I remember what I need to do! Also raised as a Lasair github issue. See https://github.com/lsst-uk/lasair/issues/64.
Goodbye ORPHANs!
Remember I had to add a few indexes to the Gaia table to speed up crossmatch results. You might have to re-export to ROE. Also still cleaning up ~1% of the PS1 data I'm struggling to assign HTMIds to. Hope to have that issue resolved tomorrow.
Finally, you will need to update the helper tables that do the column matching. tcs_helper_catalogue_views_info
& tcs_helper_catalogue_tables_info
I forgot about moving the the tcs_helper_catalogue_tables_info
and tcs_helper_catalogue_views_info
tables from QUB to Lasair at ROE as you state above. Done now!
Additionally, I somehow managed to misname the tcs_view_unclear_ps1_dr1
to tcs_view_unknown_ps1_dr1
. No idea how that happened. Renamed now. Sherlock is definitely working.
There's still an ongoing issue with Sherlock, so I've reopened this issue. We are getting the following error.
File "/data/anaconda/envs/sherlock/lib/python2.7/site-packages/sherlock/transient_classifier.py", line 1935, in generate_match_annotation
annotation = "The transient is %(classificationReliability)s with <em>%(objectId)s</em>; %(best_mag_filter)s%(best_mag)smag %(objectType)s found in the %(catalogueString)s. It's located %(location)s.%(absMag)s" % locals()
KeyError: 'classificationReliability'
The helper tables have been propagated, so it might just be a missing view or the way the view was created has changed (or not been propagated correctly).
Thanks Ken ...
Tracked the issue down to 622 rows inserted into the 'sherlock_crossmatches' table on 2020-01-06 at 05:17:25 where classificationReliability
was set to null. This should never be the case. My guess is that the database table was left in a bad state after a sherlock process was killed. From email logs I know there was a table rebuild done at some stage on 6th Jan, but not at this early hour. Might be due to the same issue (issue number 2) as pointed out in issue #53 ... but also may not!
To fix the issue I removed the sherlock classifications related to these cross matched rows:
UPDATE objects
SET
sherlock_classification = NULL
WHERE
primaryId IN (SELECT
transient_object_id
FROM
sherlock_crossmatches
WHERE
classificationReliability IS NULL);
and then reran sherlock. Everything is now running fine and sherlock completes.
Here are my thoughts on what PanSTARRS data needs extracted from MAST:
http://astronotes.co.uk/blog/2018/06/06/a-query-to-export-panstarrs-dr1-data-for-use-with-sherlock.html
See the query and sample dataset at the end. If everyone's happy I'll send the query off to MAST at the end of the day.