greenelab / generic-expression-patterns

Distinguishing between generic and experiment-specific gene expression signals.
BSD 3-Clause "New" or "Revised" License
12 stars 6 forks source link

Configuration files in another format? #26

Closed dongbohu closed 3 years ago

dongbohu commented 4 years ago

Right now all config files in configs/ directory are tab-delimited files. It is not very friendly to work with. For example, it's very easy to confuse tab delimiter with space characters.

Two alternatives that I can think of: (1) Change all config files into Python module, for example, config_test.tsv can be changed into:

local_dir = "Generic_expression_patterns_test/"
dataset_name = "tests"
template_data_file  =  "data/input/recount2_template_data.tsv"
compendium_data_file = "data/input/recount2_compendium_data.tsv"
... ...
num_recount2_experiments_to_download = 3
compare_genes = 1

then the notebooks that use them can simply do from configs.config_test import local_dir, ..., this seems much more maintainable and flexible, such as:

local_dir = "/home/foo/data/"
template_data_file = os.path.join(local_dir, "template.tsv")
...

The only issue in this solution is when another software package needs these config parameters, such as:

train_vae_modules.train_vae(config_filename, normalized_compendium_filename)

(at the end of human_analysis/1_process_recount2_data.ipynb)

In this scenario, we probably would have to build a dict with all config parameters needed by train_vae_modules.train_vae, and pass it in as the first argument.

(2) Another alternative is to use another format such as ini or yaml. This is similar to format but more flexible.

I prefer the first alternative.

ajlee21 commented 3 years ago

I agree, I think that using something like pathlib, which sounds similar to what you're suggesting, would be preferable to the current format of the config file. Since this project is coming to a close, I will leave the config format as is. But I plan to adapt one of the alternative approaches in my future work