Adding WL systematics metric

hsnee commented 5 years ago

This is not yet ready to be merged but I am starting this PR while I finish developing it and in the meantime hoping to get feedback.

overview The metric is designed to calculate the number of average i-band, WFD visits for a set of 50k objects

First it runs the ExgalM5 metric with a dust map to get a sample of objects, randomly-drawn from a uniform (equal-area in healpix) distribution within the WL-usable area (low-extinction, above a certain depth, etc)

Then the run method simply assembles a list of all the telescope dither positions (centers of FOV)

Finally the reduce method is supposed to calculate the number of times each object is observed, and then return the average.

issues The metric is running much, much slower than it runs when executing the code directly (i.e. not calling the metric from within MAF). Essentially, the result of the metric should be a single number (the average number of visits), but it seems that the reduce method is being ran for each single visit. Is this the case? If so, is there a better way for me to do this in MAF (something other than reduce methods?), where something is only ran once in the end and returns a single number for the whole metric?

Thank you!

hsnee commented 5 years ago

@yoachim or @rhiannonlynne maybe you might be able to advise on this? Thanks!

yoachim commented 5 years ago

I guess I don't see why you need the reduce method at all? Why not just move that reduce method code up into the run method? Unless you have plans for other reduce methods, that should be fine.

rhiannonlynne commented 5 years ago

The reduce method should not be running for each visit, but it would be running for each healpix (or slicePoint, if you're running on a UserPointsSlicer). There are some caching methods in MAF that preserve the metric values for each slicepoint and just reuse these when you have the same input values, but I don't think these will kick in for a reduce method.

If you only have a single number to calculate, and no other use for these gathered dither positions, I would suggest just popping everything into the 'run' method, as Peter suggested.

rhiannonlynne commented 5 years ago

Ooh - I just had a peek at your metric. I think you may have misunderstood where some of the 'building blocks' of MAF go .. https://github.com/hsnee/sims_maf/blob/master/python/lsst/sims/maf/metrics/weakLensingSystematicsMetric.py it seems that you are adding the Stacker directly into your metric? This is not quite how we use stackers. You are also querying the dB within your metric .. again, not how this is expected to work.

So, basically a metric plugs into our framework, such that your metric should expect to be handed all the observations at a single point in the sky (what goes into that point is defined by the slicer - so healpix slicer == you get all of the observations that intersect a given healpix point). This is what is fed into the "run" method via the dataSlice.

No wonder it looks like it's running many times :) When you put this together into a metricBundle and then run it via a metricBundleGroup, it will be.

Try removing everything in your metric 'run' method above and including " bundle = mBundle['temp'] " .. the values that you get in the 'slicePoint' is equivalent to what you're using as bundle.slicePoint.getSlicePoints() .... except only for that one slicePoint (so it's a simpler dictionary).

Instantiate and just use the run method on the ExGal metric within your metric, passing it the dataSlice and slicePoints -- use the values it returns as the 'bundle.metricValue' at this point.

Then calculate what you're doing in your reduce function.

hsnee commented 5 years ago

Thanks @rhiannonlynne and @yoachim, so even though I am running another metric from within (Exgal), I do not need to specify all the other blocks in the metrics bundle (and is it possible to have the other metric run with e.g. a different sql constraint?)

For the reduce method, I am basically trying to have something run at the end that is not specific to a particular slice, but rather has access to information from all the slices (that information is what I have been constructing in the current run method, and the things in the current reduce method uses that information across all slices) -- what would be the best way to do this? As I understand, having only a run method will just run on each slice, and not give me the ability to have something run afterwards?

rhiannonlynne commented 5 years ago

No, you don't need to specify all the other blocks. What you can do regarding adding a different sql constraint is to have the most general constraint at the 'outer' level (i.e. if you want to pass all filters to the exGal metric, just specify all filters as your sql constraint when running your metric) and then subselect the relevant pieces of the numpy recarray (dataSlice) when you want to use a subset of them. As an example of running a metric inside of another metric, you can have a look at the ExGal metric itself - https://github.com/lsst/sims_maf/blob/master/python/lsst/sims/maf/metrics/exgalM5.py - it runs "Coaddm5Metric" to calculate its final returned value.

Reduce methods don't work on all the slices, they work on the single slice. The idea of the 'reduce' method is to take more complicated information calculated on a slice and then pull out individual aspects of that information (PER SLICE). So, for example, if you had a metric that simulated the lightcurve for a SN given the observations at a particular slicePoint, a set of multiple reduce functions on the metric could calculate the peak magnitude of the SN, the length of time the SN was observed, etc.

If you have something you want to calculate using ALL of the information from the slices, what you actually need is something like a summary metric.

However, what I THINK you are trying to do here is this:

get the i band visits over the whole sky identify the region with E(B-V) < 0.2 pick a random set of N stars located across that entire region (i.e. if you pick 1000 stars, that will be all the stars over the entire WFD)
calculate the number of visits (after dithering) that each of those stars has then calculate the average number of visits for all stars

To do this in particular, I think you will actually need a script - because "calculate a metric at a set of random RA/Dec points" really matches to a UserPointsSlicer, but then you also want to automatically calculate the location of those RA/Dec points based on the E(B-V) values. It's ok to write a short script, we can run it independently. So I would write a script that does this:

Run the ExGal metric with a HealpixSlicer + sqlconstraint = "filter == 'i'" to get the coadded m5 value at each point in i band.
After running the metric, look at the slicer and look for map values with E(B-V) < 0.2, create a mask based on these.
Then look at the metric values (after applying the mask) and either (a) just pick all remaining points with coadded depth > 26 (as you do now) OR (b) pick the remaining healpixels corresponding to 18,000 sq deg with the deepest depths (we can show examples of doing this -- it would be similar to https://github.com/lsst/sims_maf/blob/master/python/lsst/sims/maf/metrics/areaSummaryMetrics.py)
So now you would have healpixels that cover an area with the deepest depth and e(b-v) < 0.2.
Then pick your random N points with centers randomly offset from the healpixels, and then run the CountMetric with the UserPointsSlicer (and either the same sqlconstraint or a different one), to calculate the number of visits at each of those points.
You can then apply a summary metric to those results, to calculate the average number of visits for all N points.

That said -- does it really matter if your star locations are offset from the healpix points? If not, I would do this differently and just rewrite your metric like so:

add the dustmap to the required maps for your metric (this means your slicer will also know about the E(B-V) values)
calculate ExGal values (in i band) within your metric, the same way that the ExGal metric calculates the Coaddm5Metric, for the single dataSlice only.
Take the results from the ExGal metric and your slicePoints e(b-v) values to decide if this particular healpix point should be masked or not [i.e. if e(b-v) < 0.2 and exgal coadded m5 is > 26]
Count the number of visits at this point (in i band? or all bands? whatever your main sqlconstraint is)
So, then this is calculated for all points of the healpix slicer.
THEN add a (new) summary metric that takes the points which are not masked, picks a random subsample of N points, and calculates the mean value.

I would suspect that it does not matter if you offset the stars RA/Dec values slightly from the healpix points or not, but I don't know the details.

hsnee commented 5 years ago

Thanks a lot @rhiannonlynne for the help and sorry for the delay in updating this. I've confirmed that using the healpix points as star locations (instead of randomly offset points) yields consistent results.

I've updated the metric to do points 1-5 from your last comment, could you please see if it's doing the right thing? I will then add the summary metric to take the average. Thanks again!

For reference, I'm using this code to run the metric:

proposalId = 3
runName = '/global/cscratch1/sd/husni/OpsimRuns/pontus_2489'
sqlWhere = ' filter = "i" and proposalId = '+str(proposalId)

outDir = 'temp'
myBundles = {}
nside = 128
resultsDb = db.ResultsDb(outDir=outDir)

opsdb = db.OpsimDatabase(runName+'.db')
slicer = slicers.HealpixSlicer(
    lonCol='fieldRA', latCol='fieldDec', nside=nside, useCache=False
    )
dustMap = maps.DustMap(interp=False, nside=nside)
metric = NumberOfVisitsMetric(runName=runName, depthlim=24, maps=['DustMap'] )
myBundles['metric bundle'] = metricBundles.MetricBundle(
        metric,
        slicer,
        constraint=sqlWhere,
        stackerList=[stackers.RandomDitherFieldPerVisitStacker(degrees=True)],
        runName=runName,
        metadata='running metric')
bgroup = metricBundles.MetricBundleGroup(myBundles, opsdb,
                                         outDir=outDir,
                                         resultsDb=resultsDb)
bgroup.runAll()

also tagging @rmandelb; if you would like to follow this issue, Rachel.

lsst-sims / legacy_sims_maf

Adding WL systematics metric #168