msimet / Stile

Stile: the Systematics Tests In Lensing pipeline
BSD 3-Clause "New" or "Revised" License
9 stars 6 forks source link

#7 HSC/LSST interface code #24

Closed msimet closed 10 years ago

msimet commented 10 years ago

This pull request contains code that interfaces Stile with the HSC/LSST pipeline.

Most of the action in this PR happens in a new module for Stile called "lsst". It is found in stile/lsst. (This is actually kind of misnamed at the moment--it relies on HSC functionality that isn't in the LSST side yet.) There are two files of interest in that directory:

You actually run this code via some scripts in the (new) bin/ directory. The available scripts are:

Four science choices that I wanted to highlight as needing particular input or ideas or verification from others:

This PR also includes some more systematics tests (shear around bright stars, star-star shape correlation function, galaxy-galaxy lensing type correlation function); they're all extensions of the existing CorrelationFunctionSysTest framework, so not much code there. I also added a rather unwieldy but flexible-to-the-user plotting function for the correlation function tests.

Still to be done:

I want to especially thank @TallJimbo and @HironaoMiyatake who put a lot of work and time into this PR as well, which may not be represented in the git history for these files.

TallJimbo commented 10 years ago

I do plan to look at this at some point but likely won't have a chance until later this week (Thursday or Friday). I'm traveling and my laptop power cable has died, so I'm quite literally working on borrowed time until a replacement arrives.

msimet commented 10 years ago

Yikes, Jim! Hope it lasts as long as you need it. In any case, I'm happy to take comments whenever you have them.

rmandelb commented 10 years ago

It is found in stile/lsst.

I recommend renaming to stile/hsc since it's a misleading name. If that functionality goes to the LSST side, we can always rename again.

So it's probably better to have a hard cutoff. What should that cutoff be?

Are you suggesting a hard cutoff in S/N, in maximum number of stars, or ...?

Is there an obvious subset of galaxies we should be using for this?

That's tricky. Most of the obvious samples I can think of are low density enough that at the CCD level, you might have an average of <1 per CCD.

At the moment, when we need galaxy weights for correlation function-type tests, we just return 1 for all objects.

We should weight based on the shape measurement error and shape noise added in quadrature. This is not too far from uniform weighting so it's not like what you're doing right now is horribly wrong. But probably better to update this.

The flags that we use to exclude data (the flags in the list in 'removeFlaggedObjects') were chosen to be conservative right now--we're only looking at things we're fairly sure are well-measured. We will probably want to relax this in future (and make it configurable without editing the source code too).

Open another issue for this? And for the rest of the still-to-be-done list?

As Hironao noted in issue #22, we don't let the user select tests right now--the list is hard-coded.

This is no longer true, right?

in the meantime we can't run any tests that use two-point correlation functions, because corr2 requires randoms for those.

Do you mean two-point correlations that involve a galaxy density? You are doing shape two-point correlation functions, I thought.

Unit tests: would be good to get some very small amount of default data that could be used for this.

msimet commented 10 years ago

I recommend renaming to stile/hsc since it's a misleading name. If that functionality goes to the LSST side, we can always rename again.

Done.

So it's probably better to have a hard cutoff. What should that cutoff be?

Are you suggesting a hard cutoff in S/N, in maximum number of stars, or ...?

I'm open to suggestion, basically. (I think S/N would make more sense, probably, or raw flux, rather than number of objects.)

That's tricky. Most of the obvious samples I can think of are low density enough that at the CCD level, you might have an average of <1 per CCD.

We could always just try a random selection, or semi-random above a flux cutoff--I think the important thing is to have some real objects for the test, but using all the galaxies could take too long.

We should weight based on the shape measurement error and shape noise added in quadrature. This is not too far from uniform weighting so it's not like what you're doing right now is horribly wrong. But probably better to update this.

Right, will do later today probably. (These shapes are distortions, yes?)

Open another issue for this? And for the rest of the still-to-be-done list?

Okay, later today as well.

As Hironao noted in issue #22, we don't let the user select tests right now--the list is hard-coded.

This is no longer true, right?

Right, yes: for those not watching the other issue, you can change the tests now--there are instructions if you pass --help to the scripts.

in the meantime we can't run any tests that use two-point correlation functions, because corr2 requires randoms for those.

Do you mean two-point correlations that involve a galaxy density? You are doing shape two-point correlation functions, I thought.

Sorry, yes, two-point as in position-position. I think position-shear will work without randoms but can't be sure till I fix issue #23.

Unit tests: would be good to get some very small amount of default data that could be used for this.

I'm curious what the LSST folks are doing already to test their code...

HironaoMiyatake commented 10 years ago

We should weight based on the shape measurement error and shape noise added in quadrature. This is not too far from uniform weighting so it's not like what you're doing right now is horribly wrong. But probably better to update this.

Right, will do later today probably. (These shapes are distortions, yes?)

My crude weight used for my tests so far is wt = 1./(0.362 + eerr2), where I think 0.36 is derived from SDSS in some of Reina's paper (@rmandelb could clarify). eerr is error in distortion, 1 component.

Unit tests: would be good to get some very small amount of default data that could be used for this.

The problem is that the HSC data is not public. Now is it time to set up a private repository? Or we could use somewhere else which is protected by a password.

rmandelb commented 10 years ago

So it's probably better to have a hard cutoff. What should that cutoff be? Are you suggesting a hard cutoff in S/N, in maximum number of stars, or ...? I'm open to suggestion, basically. (I think S/N would make more sense, probably, or raw flux, rather than number of objects.)

I don't think raw flux makes sense. S/N is probably best.

That's tricky. Most of the obvious samples I can think of are low density enough that at the CCD level, you might have an average of <1 per CCD. We could always just try a random selection, or semi-random above a flux cutoff--I think the important thing is to have some real objects for the test, but using all the galaxies could take too long.

Okay, perhaps a random subsample for those above a S/N cutoff.

We should weight based on the shape measurement error and shape noise added in quadrature. This is not too far from uniform weighting so it's not like what you're doing right now is horribly wrong. But probably better to update this. Right, will do later today probably. (These shapes are distortions, yes?)

Yes, distortions.

rmandelb commented 10 years ago

Regarding a small amount of data: does it have to be HSC data? If the formats used for outputs is the same for some public S-Cam data, we could run the pipeline on some S-Cam data (like one CCD) and put that output in the repo for testing.

HironaoMiyatake commented 10 years ago

I think SC data is okay. The ACT cluster data is public, so we can use it. However, I have to run soon for a short trip during this Friday and weekend, so it might be better to make a new issue for it, depending on time scale of this PR.

rmandelb commented 10 years ago

I think it’s okay to open a new issue and do it next week or whatever. It’s fine as long as we have a plan!

HironaoMiyatake commented 10 years ago

Okay, I'll open an issue!

TallJimbo commented 10 years ago

Review complete; all my comments are in the diff comments.

msimet commented 10 years ago

My crude weight used for my tests so far is wt = 1./(0.362 + eerr2), where I think 0.36 is derived from SDSS in some of Reina's paper (@rmandelb could clarify). eerr is error in distortion, 1 component.

Yes, 0.36 was what they measured in Reina's paper (0.365 actually I believe). However, that was for the sample after some cuts--I've put it in for now (with max(e1_err,e2_err) as eerr) but we might have to think more carefully about it in the future.

msimet commented 10 years ago

Noting for those not looking at the diff that what I called StileField.py and StileFieldNoTract.py are now StileVisit.py and StileVisitNoTract.py thanks to some points @TallJimbo made above about terminology.

msimet commented 10 years ago

Does anyone else have any comments on this PR? Does anyone want to look over the code that's been added due to comments? If nobody speaks up, I will merge tomorrow evening.

msimet commented 10 years ago

Pursuant to some discussion on branch #18, I just changed the call signature for the SysTestAdapters here to include the config objects from the Task.

msimet commented 10 years ago

I just realized we needed slight fixes to the tests and example scripts here, so that's done. New plan: merge tomorrow.