GSoC2024: DL Starter Problem

xoubish commented 8 months ago

GSOC2024 ML/DL Starter Problem

This is a simple toy problem meant as a pre-application exercise for the GSOC2024 project Astronomical data enhancement with DL.

Estimated completion time: a few hours

Please complete all required tasks and whichever optional task(s) allows you to convey your thought process most easily. The goal is to see what your starting points are, not what you could do with several days of research and optimization.

Overview

Here we will have a more simplified case of the actual project with no time information. We have measurements of galaxies in different wavelengthts (i.e., broadband filters) in five different fields and the task is to bring them all onto a same wavelength footing. A simple notebook to read the galaxy data in the initial filters is in this repository (gsoc-ML-exercise/ReadCandels.md).

Instructions

Clone this repo and checkout the branch gsco-ML-exercise.
Write code.
- Required: A simple way to combine all five fields in optical and NIR filters and output one file in the requested wavelengths.
- Optional: Use ML or DL to do this combination.
- Optional: Add plots to show what you did makes sense.
- Optional: Use prior information in ML by grouping galaxies at similar redshifts/stellarmasses/ etc. which are columns in the initial catalogs.
Required: Write text. 300 words max, included as a '.md' file.
- Explain what you did and how you approached the problem.
- For any code that you did not write but would if you had more time, write down what you would do.
Required: Open a PR with your code and writeup to merge to the main branch. Have GSoC2024 in the title of your PR.

If you have a question, please ask in a comment on this issue.

Shashankss1205 commented 8 months ago

Hello @xoubish, I was unable to understand what you meany by writing: "A simple way to combine all five fields in optical and NIR filters and output one file in the requested wavelengths."

Are we required to find flux values corresponding to each wavelength or something else?

Thank you! Shashank Shekhar Singh

xoubish commented 8 months ago

@Shashankss1205 Sorry if it was not clear enough. If you open the notebook you will see how I read in five different datasets which correspond to five different parts in the sky. These fields do not necessarily have observations in the exact same filters (i.e., same wavelength). The toy problem is to bring them all into a same footing which is also written in the notebook. Hope this helps.

Shashankss1205 commented 8 months ago

@Shashankss1205 Sorry if it was not clear enough. If you open the notebook you will see how I read in five different datasets which correspond to five different parts in the sky. These fields do not necessarily have observations in the exact same filters (i.e., same wavelength). The toy problem is to bring them all into a same footing which is also written in the notebook. Hope this helps.

Okay now I got a clear idea of the problem statement. Thank you @xoubish for your response. Will send my implementation as a PR in an hour.

rawann31 commented 8 months ago

Dear @xoubish ,

I am confused a little bit which data files I should work on ?

Thanks so much

xoubish commented 8 months ago

@rawann-elframawy if you open the .md notebook file you will see the data and how its read.

rawann31 commented 7 months ago

Dear @xoubish ,

I want to make sure that zbest column in fit files associated with redshift ?

Thanks so much

xoubish commented 7 months ago

Yes it is.

rawann31 commented 7 months ago

I have another question and correct me if I am wrong. For each field from five fields, there are range wl from 0 to 35000 almost. For each wavelength most values of physical parameter are given(Flux). At the plot you pick a random id (What I understand that id represent a random wl) and get the flux at this this random id. But x axis represent the central wl of each physical parameter which is diffrent from central wavelength. I want to make sure what this random id represent ?

centerwave_gs = {'VIMOS_U_FLUX':3734,'ACS_F435W_FLUX':4317,'ACS_F606W_FLUX':5918,'ACS_F775W_FLUX':7617,'ACS_F814W_FLUX':8047,'ACS_F850LP_FLUX':9055,'WFC3_F098M_FLUX':10215,'WFC3_F125W_FLUX':12536,'WFC3_F160W_FLUX':15370,'ISAAC_KS_FLUX':21600,'IRAC_CH1_FLUX':36000}

randomid = np.random.randint(len(gs))
plt.figure(figsize=(6,4))
plt.title('SED of a galaxy in the GOODS-S field')

x_centerwave_gs = []
y_gs = []
for w in centerwave_gs:
    if gs[w][randomid] > 0:  # Only plot positive flux values
        x_centerwave_gs.append(centerwave_gs[w]) # X axis
        y_gs.append(gs[w][randomid]) # Y axis
        plt.plot(centerwave_gs[w],gs[w][randomid],'*',markersize=10,color="red")

plt.yscale('log')
plt.xlabel('Wavelength(A)')
plt.ylabel('Flux (microJansky)')

I hope you understand my confusion.

xoubish commented 7 months ago

No worries. I pick random objects (i.e. galaxies) which have observations in different wavelengths. The idea of the question is if they are observed in different wavelengths how do we bring them all to the same footing (same wavelengths). It can be looked at very simply with linear interpolation or become more advanced with trying to find objects that have some similarity and learn from them.

rawann31 commented 7 months ago

Yes I understand you thanks @xoubish

nasa-fornax / fornax-demo-notebooks