Closed stevenlujpl closed 3 years ago
@stevenlujpl Thanks for adding the copyright language to the source code! In addition to the all-caps paragraph, each file needs to have this part before it to indicate who the copyright holder is:
Copyright (c) 2021 California Institute of Technology (“Caltech”). U.S. Government sponsorship acknowledged. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. • Neither the name of Caltech nor its operating division, the Jet Propulsion Laboratory, nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
@stevenlujpl I am looking into updating the DEMUD code to remove the cosmic_demud
dependency. Forgive this basic question, but how do I run the code in its new configuration? Inside the src/
directory and the cif-venv
virtual environment, I get:
$ python simulator.py
Traceback (most recent call last):
File "simulator.py", line 14, in <module>
from src.sim_config import SimulatorConfig
ImportError: No module named src.sim_config
The src/
directory is not configured as a Python module, so python -m src
(from the enclosing directory) does not work. You must have some other way of running it :)
@wkiri I haven't checked in all my code yet. At this point, I don't think we can run DEMUD with the DORA framework yet. However, each outlier detection algorithm should have its own command line interface to run. Can you use that for now?
@stevenlujpl No, I get the same kind of error:
$ python demud_ranking.py
Traceback (most recent call last):
File "demud_ranking.py", line 16, in <module>
from src.ranking import Ranking
ImportError: No module named src.ranking
@stevenlujpl I also tried Python3, but the CIF virtualenv doesn't have the necessary packages installed to support this - or maybe you plan for a DORA virtualenv that would have the Python3 packages. Feel free to point me to how you are currently running it and I will use that method.
As another suggestion, I recommend making the out_dir
argument required instead of optional, for all scripts. Currently it has a default of the current directory ("."), which will always give this error:
> python demud_ranking.py
Traceback (most recent call last):
File "demud_ranking.py", line 176, in <module>
main()
File "demud_ranking.py", line 172, in main
start(**vars(args))
File "demud_ranking.py", line 139, in start
**demud_params)
File "/home/wkiri/Research/DORA/git/src/ranking.py", line 87, in run
enable_explanation=False)
File "/home/wkiri/Research/DORA/git/src/util.py", line 121, in save_results
os.mkdir(out_dir)
OSError: [Errno 17] File exists: '.'
Making out_dir
required should enable the avoidance of a runtime error with default arguments.
The CIF implementation for DORA shuffled the indices and returned the shuffled indices with a score of 0.0 for all samples. Since in DORA all algorithms currently only return the scores, I returned the indices as the scores so they will be sorted with the random order.
This can be updated to return shuffled indices and 0.0 scores if the algorithms are updated to return both sel_ind
and scores
. See discussion on Slack:
How about just returning sel_ind and scores associated with those selections as in the CIF code? The CIF algs (except DEMUD) were I believe sorting by score internally (because we were ranking), but this step could be skipped and they could return [0,1,2,3...] as sel_ind with [score_0,score_1,....] as scores and then the Results Org could decide whether to do the sorting. That way the cost of sorting only happens if the user wants it.
Update, we now have a working loader for the astronomy use case, called the CatalogLoader. It assumes the input is .h5, however. Not .csv. So, I didn't check the box @stevenlujpl had marked for .csv data.
Thanks @urebbapr! Is there any reason that this wouldn't support other feature vector datasets as well, not just astronomy catalogs? If not, do you think it makes sense to just have a FeatureVectorLoader that is used for the catalog but also supports other datasets?
You're absolutely right @hannah-rae. I can rename it to FeatureVectorLoader.