prob-ml / bliss

Bayesian Light Source Separator
MIT License
40 stars 11 forks source link

generate multiple sources per tile #823

Closed jeff-regier closed 1 year ago

jeff-regier commented 1 year ago

In our synthetic training data, currently each tile has at most one light source. But in real data, the number of light sources per tile is better modeled as following a Poisson distribution. To make our simulated data more realistic, lets set simulator.prior.max_sources to 2, decrease simulator.prior.star_flux_min and simulator.prior.galaxy_flux_min so that undetectable light sources are occasionally generated, and then use the new catalog filtering routines to create a target_catalog for training that has at most one light source per tile and only contains detectable light sources (i.e., flux greater than some value).

aakashdp6548 commented 1 year ago

Should we also increase simulator.prior.star/galaxy_alpha to generate a greater proportion of dim sources?

jeff-regier commented 1 year ago

Maybe? I wouldn't expect a major change. Just changing simulator.prior.star_flux_min and simulator.prior.galaxy_flux_min already makes it so we generate more dim sources. alpha is more about the shape, see: https://upload.wikimedia.org/wikipedia/commons/thumb/1/11/Probability_density_function_of_Pareto_distribution.svg/1024px-Probability_density_function_of_Pareto_distribution.svg.png

aakashdp6548 commented 1 year ago

Okay sounds good. I'm planning on looking into what we can do about #825 this afternoon, so I'll kick off generation this evening or over the weekend so we don't have that PSF boundary issue in the new images.

aakashdp6548 commented 1 year ago

@jeff-regier Should we be using mean_sources=0.2 or 0.02? I still had it set to 0.02 since that's what I've been using for the PSF experiments, but with that we end up getting very few images that actually have multiple sources in any tile, much less in multiple tiles.

jeff-regier commented 1 year ago

0.2 seems good