Open 1313e opened 4 years ago
Hey @1313e thanks for your thoughts and insights.
I think there's 2 things that should be discussed here.
For number 2, I am still processing your comments and thinking which aspects I agree/disagree on. I will make a more in-depth reply or code modifications in a future update. It should be noted that majority of the package was written in 2019 during my thesis write-up, with major rewrites occurring in December last year and again this year in April-ish, with me having pretty much not using the package at all in the meantime. As a result, its not surprising that there is a lot of disconnect and conflicting ideologies as my coding style/priorities changed.
For number 1, I have recently just merged into Master a very big update to the API format. In particular, there is now a more simplified access flow. Your scenario, where you want to compute the stellar mass function, is covered in the documentation. Please let me know if it works/does not work for what you want.
Fantastic @jacobseiler. And thanks to you @1313e for being the first "external" user to try out sage_analysis
. The updates that are feasible and in-scope will make the package much more user-friendly.
@jacobseiler Separately, now that we can run sage
from python - what do you think about making the sage_analysis
package a sub-package within a sage-model
python package?
Someone's got to be the first one, right...?
Alright, I can finally get back to this. Below are some comments while I am working through this part of the documentation.
GalaxyAnalysis
);GalaxyAnalysis
, why is analyze_galaxies
not executed automatically? Can the GalaxyAnalysis
object be useful in any way before that method is executed?Model.properties
a dict containing the keys snapshot_xx
with xx
being a snapshot number?
If it solely contains snapshots, it is much more logical if I can index this dict using the snapshot number.analyze_galaxies
is executed automatically upon initializing the class, the program knows what snapshots are available and thus making all these empty containers is not necessary.plot_toggles
argument of GalaxyAnalysis
is misleading.
If I toggle something off, then that should only toggle the plotting of that property, not the analysis.
Currently, there is no way to toggle off the plotting of a property, but still read it in.
This is made even more confusing by the presence of the galaxy_properties_to_analyze
argument, which suggests that it toggles the properties to analyze.plot_toggles
argument a dict of bool?
Just a list of str would be much easier.galaxy_properties_to_analyze
argument works, as that one definitely requires some explanations.
I guess this kind of does the job, but no link to it is given.None
instead, as that holds no meaning at all (e.g., a zero for the SMF is perfectly possible, a None is not. The latter would inform the user immediately that the data is not available.)@1313e Looking through your comments, I see that most of them pertain to software design and usability. Other than the number-density comment, is there anything that is preventing you from using thesage_analysis
package to read in sage output files (for use with PRISM
)?
@1313e Looking through your comments, I see that most of them pertain to software design and usability. Other than the number-density comment, is there anything that is preventing you from using the
sage_analysis
package to read in sage output files (for use withPRISM
)?
No, there is not.
@1313e Looking through your comments, I see that most of them pertain to software design and usability. Other than the number-density comment, is there anything that is preventing you from using the
sage_analysis
package to read in sage output files (for use withPRISM
)?No, there is not.
Does that mean you could successfully run sage
with PRISM
?
@1313e Looking through your comments, I see that most of them pertain to software design and usability. Other than the number-density comment, is there anything that is preventing you from using the
sage_analysis
package to read in sage output files (for use withPRISM
)?No, there is not.
Does that mean you could successfully run
sage
withPRISM
?
No, as the number density comment has not been dealt with, right?
@1313e Ahh - yes, of course!
@jacobseiler What's the recommended way to compute the co-moving number-density within sage_analysis
?
To compute the number density of galaxies divide the number of galaxies by the simulation volume * the bin widths.
In particular, given a specific model
output,
bin_widths = model.bins["stellar_mass_bins"][1::] - model.bins["stellar_mass_bins"][0:-1]
normalization_factor = model._volume / pow(model.hubble_h, 3) * bin_widths # Assumes Simulation was specified in Mpc/h
for snapshot in snapshots:
norm_SMF = model.properties[f"snapshot_{snapshot}"]["SMF"] / normalization_factor
Now that OzSTAR is usable again, I can keep going with this. However, most of the comments above are usability comments, and if they would be taken into account, I will have to rewrite my code again as well.
Oh, btw, can I turn off the printing of all the information to screen somewhere? I don't think a user wants to see that a few 1000 times.
Is there any comparison SMF data that I can use to constrain SAGE with? Of course, the data must have matching logM bins as the output of SAGE.
SAGE
compares against the Baldry 2008 paper.
Baldry = np.array([
[7.05, 1.3531e-01, 6.0741e-02],
[7.15, 1.3474e-01, 6.0109e-02],
[7.25, 2.0971e-01, 7.7965e-02],
[7.35, 1.7161e-01, 3.1841e-02],
[7.45, 2.1648e-01, 5.7832e-02],
[7.55, 2.1645e-01, 3.9988e-02],
[7.65, 2.0837e-01, 4.8713e-02],
[7.75, 2.0402e-01, 7.0061e-02],
[7.85, 1.5536e-01, 3.9182e-02],
[7.95, 1.5232e-01, 2.6824e-02],
[8.05, 1.5067e-01, 4.8824e-02],
[8.15, 1.3032e-01, 2.1892e-02],
[8.25, 1.2545e-01, 3.5526e-02],
[8.35, 9.8472e-02, 2.7181e-02],
[8.45, 8.7194e-02, 2.8345e-02],
[8.55, 7.0758e-02, 2.0808e-02],
[8.65, 5.8190e-02, 1.3359e-02],
[8.75, 5.6057e-02, 1.3512e-02],
[8.85, 5.1380e-02, 1.2815e-02],
[8.95, 4.4206e-02, 9.6866e-03],
[9.05, 4.1149e-02, 1.0169e-02],
[9.15, 3.4959e-02, 6.7898e-03],
[9.25, 3.3111e-02, 8.3704e-03],
[9.35, 3.0138e-02, 4.7741e-03],
[9.45, 2.6692e-02, 5.5029e-03],
[9.55, 2.4656e-02, 4.4359e-03],
[9.65, 2.2885e-02, 3.7915e-03],
[9.75, 2.1849e-02, 3.9812e-03],
[9.85, 2.0383e-02, 3.2930e-03],
[9.95, 1.9929e-02, 2.9370e-03],
[10.05, 1.8865e-02, 2.4624e-03],
[10.15, 1.8136e-02, 2.5208e-03],
[10.25, 1.7657e-02, 2.4217e-03],
[10.35, 1.6616e-02, 2.2784e-03],
[10.45, 1.6114e-02, 2.1783e-03],
[10.55, 1.4366e-02, 1.8819e-03],
[10.65, 1.2588e-02, 1.8249e-03],
[10.75, 1.1372e-02, 1.4436e-03],
[10.85, 9.1213e-03, 1.5816e-03],
[10.95, 6.1125e-03, 9.6735e-04],
[11.05, 4.3923e-03, 9.6254e-04],
[11.15, 2.5463e-03, 5.0038e-04],
[11.25, 1.4298e-03, 4.2816e-04],
[11.35, 6.4867e-04, 1.6439e-04],
[11.45, 2.8294e-04, 9.9799e-05],
[11.55, 1.0617e-04, 4.9085e-05],
[11.65, 3.2702e-05, 2.4546e-05],
[11.75, 1.2571e-05, 1.2571e-05],
[11.85, 8.4589e-06, 8.4589e-06],
[11.95, 7.4764e-06, 7.4764e-06],
], dtype=np.float32)
Baldry_xval = np.log10(10 ** Baldry[:, 0] / hubble_h / hubble_h)
if imf == "Chabrier":
# Convert the Baldry data to Chabrier.
Baldry_xval = Baldry_xval - 0.26
Baldry_yvalU = (Baldry[:, 1]+Baldry[:, 2]) * pow(hubble_h, 3)
Baldry_yvalL = (Baldry[:, 1]-Baldry[:, 2]) * pow(hubble_h, 3)
Baldry_xval
is the log10 Msun value and yvalU
/yvalL
is the upper/lower bounds in units of Mpc^-3 dex^-1.
Those values are incompatible with SAGE, as SAGE does not calculate the SMF at these logM values.
Just to clarify ... The SMF needs to be calculated from the data that SAGE generates. This is true for all models (and astronomy data in general). The SMF is not a property of the galaxies.
Just to clarify ... The SMF needs to be calculated from the data that SAGE generates. This is true for all models (and astronomy data in general). The SMF is not a property of the galaxies.
Yeah, I know. But SAGE does not calculate the SMF values at these logM values, therefore making it incompatible for comparison with the given data.
I don't understand what you mean. Stellar mass is a galaxy property in SAGE with units 10^10 h-1Msun. That can be converted to the same logM as the Baldry data (i.e. multiply by 10^10, divide by little h, take the log10). Baldry is given that way because it's how the SMF is usually plotted. Does that answer your question?
I think the confusion may be that we're using two definitions of SAGE
interchangeably,
SAGE
: the semi-analytic model that does the galaxy evolutionSAGE-analysis
: the code that reads the output and does the computation of the SMF.Currently, SAGE-analysis
is computing the SMF in log10 Msun bins from [8.0, 12.0]
in widths of 0.1. Wheras Baldry Is from [7.05, 11.95]
in widths of 0.1. I belive Ellert is saying that because these bins are different, it is not valid to compare the SMF produced by default SAGE-analysis
and that of Baldry 2008.
Uhm, yes, that is what I am saying. I could not find anywhere where I can set what bins to calculate the SMF for.
Try this. Change the bin_low
, bin_high
, bin_width
values and point to the correct parameter file. It should compute only the stellar mass function in the galaxy_analysis.models[0].properties["SMF"]
attribute.
(Yes I am very aware that this is hacky, ill-thought out, badly constructed etc etc)
from sage_analysis.galaxy_analysis import GalaxyAnalysis
from sage_analysis.default_analysis_arguments import default_galaxy_properties_to_analyze
par_fnames = ["/home/Desktop/sage-model/input/millennium.ini"]
properties_to_analyze = default_galaxy_properties_to_analyze.copy()
properties_to_analyze["stellar_mass_bins"] = {
"type": "binned",
"bin_low": 8.0,
"bin_high": 12.0,
"bin_width": 0.1,
"property_names": ["SMF"],
},
galaxy_analysis = GalaxyAnalysis(par_fnames, properties_to_analyze=properties_to_analyze, plot_toggles={"SMF": True})
galaxy_analysis.analyze_galaxies()
properties_to_analyze
is not a valid keyword argument of GalaxyAnalysis
.
I think you mean galaxy_properties_to_analyze
.
Which could be shortened to just analyze_props
.
Is there a way to disable all printing output from both SAGE and sage_analysis
?
@1313e Are you using the cffi
branch of sage
?
@1313e Are you using the
cffi
branch ofsage
?
Yes
I have an easy but hacky way. After you pull in the latest commit on the cffi
branch, add the following line
freopen("/dev/null, "w", stdout);
at the top of the function init_sage
and then add this line
freopen("/dev/tty", "w", stdout);
at the end of finalize_sage
. That should do the trick by redirecting all standard output into the blackhole of /dev/null
and then restoring after sage
is done.
(Correctly solving what you are requesting is a bit more involved)
In order to get SAGE analyzed using PRISM, I have started writing a
SAGELink
wrapper class that PRISM can use for that. As I therefore require to be able to process the output from SAGE, in this case calculating the SMF, I would like to usesage-analysis
to do exactly that. However, I find myself getting incredibly frustrated with using it, mostly due to the different classes in the package being very disconnected from each other. I would therefore like to share my thoughts on how this can be improved (note that they apply to both themaster
and thedev
branch).Before I do so, I would like to share what exactly the situation is:
According to the documentation, I require two things: An instance of the
Model
class (which stores all model information); and an instance of a subclass of theDataClass
class (which handles the model data), in my case theSageHdf5Data
class. So, I first attempt to initialize theModel
class. Surprisingly enough, on themaster
branch, theModel
class takes no mandatory arguments, even though it cannot be used if it is not connected to output of SAGE. After all, it cannot run SAGE itself, so it must be connected to output. On thedev
branch on the other hand, theModel
class takes many arguments that are not required in any way, as they are saved in the SAGE output already. Therefore, theModel
class should only take 1 argument: The path to the master HDF5-file with the SAGE output. Everything else is already known at that point once that file is read.This brings me to the second thing I noticed: No data is read. If I attempt to initialize the
SageHdf5Data
class, using an instance ofModel
(this should actually be done automatically upon me providing the path to the output file), nothing actually happens. TheSageHdf5Data
does not read in any data or attributes stored in the HDF5-file at all. This is strange, as theModel
class requires these attributes. The only thing that happens is that the open HDF5-file is stored inModel.hdf5_file
, but nothing is read from it.Additionally, it is strange that the
Model
instance that I provide toSageHdf5Data
does not get stored as an attribute in the instance (and vice-versa). This means that any time that the latter instance required the former, I have to provide it. Even though it knows what instance should be used, as it was provided during initialization.This becomes a problem when I try to use
calc_SMF
(https://github.com/sage-home/sage-analysis/blob/master/sage_analysis/example_calcs.py#L24-L60). First of all, this function should be a method ofModel
. Not only because this function requires an instance of it, but also because it modifies attributes in theModel
instance. The latter is only allowed by methods of that instance. Never by a stand-alone function. In case of a stand-alone function, the results should simply be returned instead of saved in the instance.Secondly, the function requires the
gals
argument, which can be obtained fromDataClass.read_gals
, which again requires theModel
instance. Why does thecalc_SMF
function not simply ask for the snapshot the SMF should be calculated for? IfModel
contains a pointer to the instance ofDataClass
(and vice-versa), then all required information is there.However, even if I provide all this information,
calc_SMF
still cannot be executed. After all, theModel
instance does not have thehubble_h
property set, as no attributes of the SAGE model were read in automatically anywhere. This is also weird, as instance properties should ALWAYS be set. Either to a default value or to a value that was determined somewhere. Getting an instance property should never raise an error, in any situation.Unless I have made major errors in my process described above, this is not acceptable. A user would require too much knowledge and effort to perform simple tasks, that the programmer can do for them already. In case of
sage-analysis
, the biggest problem is that theModel
andDataClass
classes are not connected to each other in any way, even though they require each other in order to function. Besides that, a user would expect that when a class is initialized that serves to handle the data stored in a specific file, that it then also reads that data whenever necessary. It should not be necessary to read that data separately, and also assign it separately, as that completely negates the purpose of said class.What I recommend doing is going over all the definitions in the entire package, and check everywhere whether it makes sense that user input is required for something. A user should ideally never have to provide any information more than once, directly or indirectly. Making sure that it works that way will significantly improve the user experience.
Let me know if there are any questions that I need to answer, or if more information is required.