karenlmasters / gz-hubbleseq

2 stars 3 forks source link

Double check sample selection #5

Open sandorkruk opened 7 years ago

sandorkruk commented 7 years ago

@karenlmasters I have checked the sample selection. There are no major issues, just inconsistencies with the numbers.

  1. I get 1256 odd galaxies with the selection p_odd,yes>0.42, N_odd,yes>20 & (p_merger+p_disturbed+p_irregular)>0.6, not 1362. I used debiased likelihoods. The number of merger (73), disturbed (411) and irregular (848) adds up to 1332, not 1362.
  2. In table 2 on the left column, the total number of galaxies is 22118, even though we say that we eliminate "odd" galaxies, so the total should less than that...The right columns correctly reflect the total number of "normal" galaxies, so without the "odd" ones.
  3. I'm confused by some of the subsamples (do we really need this many subsamples?). For example there is little difference between p>0.5 and the "majority" subsamples. Also I think there will be a small overlap between the smooth and the features subsamples selected with the W13 criteria: p_smooth>0.469 & p_features>0.430. What if a galaxy has p_smooth=0.47 and p_features=0.53? It will be in both samples. Would that be a problem? (probably not, as we're only interested in featured)
  4. In section 3.1 we're not clear enough to which "featured" sample we use - which criteria are we using to select this featured sample, W13?

Overall, the plots shouldn't change much based on the different selection criteria. I think that we define too many subsamples (which in some cases are very similar) and then only use one of the subsamples for the science. I think that we should just use one way of selecting the sample of "featured" galaxies and use it for the whole paper.

Then, we should also be careful if we're using the W13 or the Hart2016 debiasing to be consistent with the numbers and the plots.

karenlmasters commented 7 years ago

Hi @kruksandor - thanks for this. Looks like could be a > vs. >= difference, although honestly it's so long since I did it I have no idea.

I agree too many subsamples - we should simplify and stick to a simple story for a nice short paper. :)

I would be in favour of making use of the Hart2016 debiasing.

karenlmasters commented 7 years ago

OK now I'm really confused - there is no Table 2, and I can't find a table with 22118 in it (I see it in the text). When I say I removed odd galaxies, I meant these 8 "We remove eight of these galaxies which have more than 50% of their classification votes for “star or artefact”."

karenlmasters commented 7 years ago

OK never mind - you mean Table 1, and I just don't remember what I did in 2014 at all!

karenlmasters commented 7 years ago

So I looked back at my code, and I used the majority featured sample, with the mergers/odd etc removed. I then removed anything which wasn't "oblique" (face-on and nearly face-on) as described in Section 3.1.

If you have Ross's debiased sample to hand and would run these for the updated version that'd be great.

sandorkruk commented 7 years ago

I have Ross's debiased sample at hand and I could rerun the sample selection to update the table and numbers in the paper. Just to check, which subsample are we sticking to for the rest of the paper? (debiased p_features> debiased p_smooth). I'm currently in Munich giving talks, so realistically can get these done by Friday. Hope it's ok.

karenlmasters commented 7 years ago

Great. I have travel Thur/Fri also (and Mon/Tue) so actually by Wednesday next week would be fine if that helps.

My vote is to use the majority sample so we can include all galaxies.