willettk / gzhubble

Reduction and analysis materials for the Galaxy Zoo: Hubble project.
http://hubble.galaxyzoo.org
MIT License
4 stars 5 forks source link

Is GZH consistent with GZ:CANDELS? #54

Open willettk opened 8 years ago

willettk commented 8 years ago

Question for both @vrooje and @willettk to attack, for those galaxies that have classifications in both. It'd be very useful to show that we're at least (hopefully) internally consistent with the GZ vote fractions.

vrooje commented 8 years ago

Agreed, or in fact to show that there may be interesting cases where the classification differs at different wavelengths. What do you need from me?

On Fri, Apr 29, 2016 at 7:03 AM, Kyle Willett notifications@github.com wrote:

Question for both @vrooje https://github.com/vrooje and @willettk https://github.com/willettk to attack, for those galaxies that have classifications in both. It'd be very useful to show that we're at least (hopefully) internally consistent with the GZ vote fractions.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/willettk/gzhubble/issues/54

willettk commented 8 years ago

Expertise and input, I suppose - I have the data tables from your paper already.

What do you think of this Simmons-style plot? This is for the 7,681 galaxies in both GZH and GZC. Most interested to hear what you think about p_features (shown on the top row, measured in three ways: raw vote fraction, weighted vote fraction, and GZH debiased vs GZC weighted). Three other metrics similar to Fig 7 in your paper are on the bottom row. gzh_gzc

vrooje commented 8 years ago

I don't understand the blue lines in the figures. If they're the averages binned by GZH why don't they span the whole y-axis range in each figure, and why don't they go through the main locus of points in the top row especially?

Really curious to know what redshifts these are as well -- if they're in the regime where GZH is sampling mostly UV and GZC is sampling entirely optical then it wouldn't surprise me if GZH picked up more "features" than GZC, and clearly the debiasing is accentuating that. If we binned these by redshift I might expect to see a pretty steep trend, or at least I wouldn't be surprised if there was a change right around where the GZH wavelengths start to be dominated by UV (which depends on the survey field, obviously). That might also be true of the spiral vote fractions... I often had the sense looking at the GZC images that the spirals appeared weaker for galaxies with z < 1.5 or so because the SF areas in the spiral arms just weren't observed at CANDELS wavelengths.

In the one metric shown that depends least on wavelength and surface brightness (edge-on), they are really well matched.

The merger plot is really hard to see anything... is it worth showing this? These trees really differ a lot in how they deal with mergers, and I think that is very much a point worth making... I'm just not sure this is the way to make it.

One other thing I'd love to see in a plot like this is the clumpy vote fraction. Really curious to see what that looks like.

willettk commented 8 years ago

It's very possible I accidentally haven't computed averaging over the bins in the correct manner (or at least in the same way that you did). Maybe you could help me figure out how - I didn't find your code on the repo, so couldn't exactly replicate it. What I tried was (in pseudo-code):

bins = np.linspace(0,1,10)
db = bins[1] - bins[0]
gzc_avg,gzc_std = np.zeros(nbins-1),np.zeros(nbins-1)

for i,b in enumerate(bins[:-2]):
    gzc_avg[i] = np.mean(gzc[(gzh >= b) & (gzh < b+db)])
    gzh_avg[i] = np.mean(gzh[(gzc >= b) & (gzc < b+db)])

plot(bins,gzc_avg)
plot(bins,gzh_avg)

Full code: https://github.com/willettk/gzhubble/blob/candels/python/gzh_gzc.py

vrooje commented 8 years ago

Shouldn't it be:

...
plot(gzc_avg, bins)
plot(bins, gzh_avg)

Or maybe the other one is flipped... sorry, just reading quickly, but that does look like what I did apart from that.

willettk commented 8 years ago

Exactly so, thanks. Here's the correct version (now actually tracing the joint distribution):

gzh_gzc

Interesting that there's no real change in the average GZH features votes except in the highest p_features bin.

vrooje commented 8 years ago

Small numbers? On the first GZC question there were so many "star or artifact" votes per galaxy (distracted by noise, I expect) that it's really rare to have p_feat-gzc-raw > 0.8.

Suggest coarser bins for the merger question?

vrooje commented 8 years ago

Also, I think our error bars are different -- the error bars in the GZC paper's Figures 7 & 8 enclose the middle 68% of data in each bin, which I'd expect to be a bit more asymmetric in a couple of the panels above...

willettk commented 8 years ago

Coarser bins in p_merger and 68% median errorbars:

gzh_gzc

vrooje commented 8 years ago

👍

So, reading left to right and top to bottom for panels abcdef:

a) raw classifications agree pretty well but i) features are more likely to be picked up in rest-frame bluer colors and/or higher-resolution images, and ii) the GZC raw votes have very few f_feat > 0.8 galaxies. b) even once the classifications are weighted the effect in ii) above may go away, but resolution and rest-frame colors still mean you find a higher f_features in GZH vs GZC. c) GZC didn't debias and GZH did and that does make a slight difference, though not a significant one on average (?). d) Okay I suppose we could use fewer bins across the board on panels bcd but these agree well. e) SF features like spirals pop in rest-frame optical & UV images. f) ... merger features are often SFing too so might be more obvious in GZH but also, these questions are asked very differently and it makes them much harder to compare.

Does all that sound about right? Note I haven't read the text around this figure yet :)

willettk commented 8 years ago

It's actually not in the paper yet. Although I think that's one of @chrislintott's main suggestions.

chrislintott commented 8 years ago

Indeed. By the way 'I didn't find your code on the repo' is beautifully subtle snark. Shame, @vrooje, shame.

willettk commented 8 years ago

(@willettk frantically pushes #72 into master so that he practices what he preaches)

vrooje commented 8 years ago

That's because my code was in a different repo!

(@vrooje frantically moves it to the paper repo and pushes it)

bamford commented 8 years ago

This looks like a really nice comparison, and nice to demonstrate 'morphological k-correction' issue (although, as @vrooje says, disentangling this would be a bit tricky as each survey uses different filter sets).

:+1: for putting this in the paper.

How are you selecting galaxies for inclusion in the lower panels? Relaxing this to give more galaxies might help appearance.

willettk commented 8 years ago

For the panels on the bottom row:

The easiest way to include more galaxies is by relaxing the restriction on the number of votes, but I do want to make sure that they're reasonably sampled. I could try again with N>=10, but I don't know about going much lower than that.

I'm glad there's interest in putting this figure in the paper, but we'll need to flesh out the end of 7.2 a little bit (or make its own section). It'll need some explanation about why the vote fractions are different and potentially some comments about how to properly compare data from the two sets.

karenlmasters commented 8 years ago

I agree with other comments that this would be a good addition. It's reassuring they agree so well given the different wavelength ranges they cover.

Any chance to look at bars? Not essential for this draft, I'm just curious to see how it looks. Or was that already in @vrooje paper?

vrooje commented 8 years ago

Nope, I didn't look solely at bars; I think that would be really interesting.

willettk commented 8 years ago

Agreed on the interesting part; I will probably vote to push that to a later, bar-focused paper just in the interests of time constraints.

chrislintott commented 8 years ago

Bars in this sample would make a very interesting follow-up...

willettk commented 8 years ago

Just a note: I know this didn't make it into the initial submission, but if the referee asks for any more detail on GZH vs. GZC, I'd be in favor (sorry @chrislintott, "in favour") of adding the figure above and another paragraph or so into the manuscript.

vrooje commented 8 years ago

The referee didn't ask about this, right?

Mel23 commented 8 years ago

Correct.

vrooje commented 8 years ago

Do we still want to add it? Seems like it's opening us up to another round of refereeing, but it is potentially useful information.

karenlmasters commented 8 years ago

Second paper? I think including it if not asked to by the referee is dangerous, and this could be a nice short paper on morphological k-correction on its own....

willettk commented 8 years ago

I think I'll vote for pausing it for now, since I think we really need to focus on addressing what the referee did ask about. But I very much don't want it to be lost, given the interesting results and effort that @vrooje and others have put in. +1 if anyone wants to make it a short paper of their own (or maybe a useful graphic for when we give talks).

On Fri, Aug 12, 2016 at 4:33 AM karenlmasters notifications@github.com wrote:

Second paper? I think including it if not asked to by the referee is dangerous, and this could be a nice short paper on morphological k-correction on its own....

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/willettk/gzhubble/issues/54#issuecomment-239391140, or mute the thread https://github.com/notifications/unsubscribe-auth/ACLhayvomEEAH8VHdx3hvgiqKO_CMimuks5qfC_pgaJpZM4IS2Cv .