kfuku52 / amalgkit

RNA-seq data amalgamation for a large-scale evolutionary transcriptomics
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

amalgkit config #104

Closed kfuku52 closed 1 year ago

kfuku52 commented 1 year ago

amalgkit metadata should be started more easily, and there should be a single command that generates all necessary config files (near empty to be neutral enough for any purposes/organisms). A new subcommand should look like csubst dataset and may be named amalgkit config. @Hego-CCTB Could you implement it?

Hego-CCTB commented 1 year ago

On it!

Hego-CCTB commented 1 year ago

amalgkit config is done: https://github.com/kfuku52/amalgkit/commit/dd20107aa74597a11d324a767c804cb12184ec47

Had to hard-code the file creation for all .config files, since I wanted to keep the headers explaining how to use the config files.

Currently there are three parameters: --out_dir, default: ./ directory where config will use as working_directory (will create a folder named config in --out_dir)

--config_dir, default empty_config name of the folder where the config files will be created in

--overwrite, default no if --config_dir already exists, amalgkit config will stop immediately if --overwrite is no. If yes, config files will be overwritten insetead.

Resulting directory infrastructure:

out_dir <-- can be changed
│
└───config <-- can not be changed
│   │
│   └───config_dir <-- can be changed
│       │ control_term.config
│       │ exclude_id.config
│       │   ...
kfuku52 commented 1 year ago

Thanks. Can we stick to the way used in csubst dataset (i.e., just copy files to a specified directory)? The config files should be near-empty, but not completely empty because there are commonly applied conditions, such as "Filter" "type rnaseq" in search_term_other.config, and it is difficult to modify if hardcoded in the python script. I see no problem with headers, so could you explain it more?

Hego-CCTB commented 1 year ago

Ah, now the csubst code makes more sense to me. There is a nearly empty set of files somewhere in the installation folder (i.e. previously manually created), and the command just copies them somewhere else? Whereas I created the files from scratch.

kfuku52 commented 1 year ago

Yes, the files for csubst dataset are placed here: https://github.com/kfuku52/csubst/tree/master/csubst/dataset

Hego-CCTB commented 1 year ago

alright, took me a while to figure out how this works, but here's the new version: https://github.com/kfuku52/amalgkit/commit/0fe5bb77ae764ced3724428bb94a900a307b2261

At first I wanted to use pkg_resources as in csubst, but upon further investigation it is apparently better to use importlib.resources to retrieve files from within the package, since pkg_resources is deprecated now.

Anyways, how amalgkit config works now is the following: --overwrite terminates the process if the target directory is already there and set to no. --config_dir this is the name of the config directory to be created. i.e. the destination (defaults to the name of --config) --config this is the name of the config directory stored in the amalgkit package. i.e. the source currently there are four options: 'test', 'plantae', 'vertebrate' and 'base'. 'base' is the nearly empty dataset, where I left just the header and 1-3 lines as an example within each of the config files. --out_dir working directory

Example:

amalgkit config --config base --config_dir my_config --out_dir  ./

will create ./config/my_config/ and copy all .config files from 'base' in there.

amalgkit config --config vertebrate --out_dir  ./

will create ./config/vertebrate/ and copy all .config files from vertebrate in there.

amalgkit config

will create ./config/base and copy all .config files from 'base' in there.

Note: I had to make a copy of the config folder inside amalgkit and give it a different name for this to work. The config folder is now redundant, so we can delete it.

kfuku52 commented 1 year ago

Did your test go well with a fresh install with pip? I got an error:

(base) wbo1129:~ kef74yk$ amalgkit --version
amalgkit version 0.6.7.2
(base) wbo1129:~ kef74yk$ amalgkit config --config base --config_dir my_config --out_dir  ./
amalgkit config: start
Checking config directory ...
Traceback (most recent call last):
  File "/Users/kef74yk/opt/miniconda3/bin/amalgkit", line 404, in <module>
    args.handler(args)
  File "/Users/kef74yk/opt/miniconda3/bin/amalgkit", line 105, in command_config
    config_main(args)
  File "/Users/kef74yk/opt/miniconda3/lib/python3.9/site-packages/amalgkit/config.py", line 85, in config_main
    create_config_from_package(args)
  File "/Users/kef74yk/opt/miniconda3/lib/python3.9/site-packages/amalgkit/config.py", line 70, in create_config_from_package
    config_files = ir.files(config_base).rglob('*.config')
  File "/Users/kef74yk/opt/miniconda3/lib/python3.9/importlib/resources.py", line 147, in files
    return _common.from_package(_get_package(package))
  File "/Users/kef74yk/opt/miniconda3/lib/python3.9/importlib/resources.py", line 49, in _get_package
    module = _resolve(package)
  File "/Users/kef74yk/opt/miniconda3/lib/python3.9/importlib/resources.py", line 40, in _resolve
    return import_module(name)
  File "/Users/kef74yk/opt/miniconda3/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'config_dir'
Hego-CCTB commented 1 year ago

I pushed another update, which should fix this issue. Can you try it out on your end? https://github.com/kfuku52/amalgkit/commit/eb22ce5cc382472079b45cbfa8669063c869a2b6

kfuku52 commented 1 year ago

The latest version worked well. Thank you!

Hego-CCTB commented 1 year ago

Great! I will close this for now, then.