Resolve targets - Githubissues

justincely / lightcurve_pipeline

pipeline for high level science products of HST 13902

BSD 3-Clause "New" or "Revised" License

2 stars 0 forks source link

Resolve targets #5

Open bourque opened 9 years ago

bourque commented 9 years ago

Output lightcurves are separated based on TARGNAME, but many of the TARGNAMEs differ even though they refer to the same target. For example,

SATURN-VISIT11-SLEW-ORBIT1
SATURN-VISIT11-SLEW-ORBIT2
SATURN-VISIT21-SLEW-ORBIT1
SATURN-VISIT21-SLEW-ORBIT2
SATURN-VISIT32-SLEW-ORBIT1
SATURN-VISIT32-SLEW-ORBIT2

presumably all refer to the same target. It would be useful to resolve the targets during ingest and place similar targets in the same directory in the filesystem (thus enabling more complete composite lightcurves)

justincely commented 9 years ago

Potentially easier ones to resolve are examples of just using hyphenation/punctuation differently:

GD71 vs. GD-71
WD-0308-565 vs. WD0308-565

Some of these cases are for targets that have large amounts of exposure time, so keeping them together would be very beneficial.

justincely commented 9 years ago

Perhaps we can hack into the archive's target resolver?

bourque commented 9 years ago

It is probably best to resolve some of these targets by hand. One way we could do this is build a dictionary whose keys are the TARGNAMEs that we want to consider true and whose values are lists of TARGNAME alternatives to the key. For example:

targ_dict['SATURN'] = ['SATURN-VISIT11-SLEW-ORBIT1', 'SATURN-VISIT11-SLEW-ORBIT2', ...]

This dictionary can then be flipped to save on computing time, as such:

targ_dict['SATURN-VISIT11-SLEW-ORBIT1'] = 'SATURN'
targ_dict['SATURN-VISIT11-SLEW-ORBIT2'] = 'SATURN'

Then, in the pipeline, before the TARGNAME is added to the database, it can be checked to see if it exists as a key in this dictionary, and if it is, the TARGNAME can be exchanged with the dictionary value.

This will help us build a TARGNAME vs EXPTIME plot, as mentioned in issue #9.

bourque commented 9 years ago

@justincely found an online target resolving service that perhaps can help us resolve some of the targets. He made a wrapper function resolve() in the resolve.py module that takes a target name and returns a set of resolved target names.

I've made an notebook that plays around with the target resolver. It appears that only ~20% of the target names are able to be resolved.

justincely commented 9 years ago

@bourque 20% of the targets is fine - that actually makes a good deal of sense. HST time is very competitive, and duplications need to be well justified. So it's definitely going to be in the minority when a target is observed twice with different names.

bourque commented 9 years ago

I created a dictionary in utils.targname_dict that stores some manually resolved targets. This dictionary was made with two general rules:

Targets with hyphens (not dashes that indicate "negative", but hyphens) were changed with hyphens removed
Targets that contained COPY, or REPEAT, etc., or were numbered sequentially (e.g. JUPITER-NORTH1, JUPITER-NORTH2, etc.) were changed to just the nominal target name.

For targets that do not need to be resolved, the dictionary values are blank (e.g. ''); in this way, one can perform a diff between future database instances the dictionary keys to see which targets need to be added.

bourque commented 9 years ago

@justincely found the resolver that MAST uses: http://mastresolver.stsci.edu/Santa-war/. This could help us resolve targets even further.