huangnengCSU / compleasm

A genome completeness evaluation tool based on miniprot
Apache License 2.0
172 stars 16 forks source link

Add ability to set library path via COMPLEASM_LIBRARY_PATH if --library_path unspecified #32

Open douglasgscofield opened 5 months ago

douglasgscofield commented 5 months ago

Adds the ability to specify the library path (what is set via --library_path) using the environment variable COMPLEASM_LIBRARY_PATH. The logic is:

  1. change default for --library_path options wherever they appear to None
  2. at the end of argument processing, if args.library_path == None, then check if environment variable COMPLEASM_LIBRARY_PATH is set
  3. if it is set, use its value for args.library_path
  4. if it is not set, use the current default value, mb_downloads, which will be in the current directory

This also modifies the __init__ logic in Downloader to do the same.

This change enables using a central location for lineage sets, useful for streamlining project-wide storage or, for example, for HPC clusters such as ours where we've already downloaded the lineage sets to the same system-wide location for both BUSCO and compleasm. These lineage sets do not often change, so enabling the use of a common location for them is not just feasible but recommended.

douglasgscofield commented 5 months ago

I should add that to use existing BUSCO v5 lineage sets for compleasm, each of the lineage directories, e.g., methanomicrobia_odb10, needs a corresponding methanomicrobia_odb10.done file at the same level, this can be created with touch for each lineage directory:

cd <BUSCO v5 lineage sets base directory>/lineages
for D in *_odb10; do
    test -d $D && touch $D.done
done

Also, if you uncompress the refseq_db.faa.gz within each lineage directory, leave the gzipped version in place for compleasm to use.