zjshi / gt-pro

MIT License
23 stars 7 forks source link

`assert os.path.isdir(in_dir)` => no helpful error message #54

Open nick-youngblut opened 1 year ago

nick-youngblut commented 1 year ago

The assert statement in the code below (and similar assert statements throughout the codebase) do not provide helpful error messages.

def locate_genomes(in_dir):
    assert os.path.isdir(in_dir) 
    fpaths = []
    for f in os.listdir(in_dir):
        fpath = in_dir.rstrip('/')+'/'+f
        if os.path.isfile(fpath) and fpath[-4:] == ".fna" or fpath[-3:] == ".fa":
            fpaths.append(fpath)
            sys.stderr.write("\tgenome path found: {}\n".format(fpath))

    sys.stderr.write("{} genomes sequences will be used for database building\n".format(len(fpaths)))

    return fpaths

While https://www.sciencedirect.com/science/article/pii/S2666166722008449 is nice to have, I am still trying to figure out the correct input directory structure and file naming for GT_Pro build (e.g., reference genome fasta files must end in *.fna or .fa according to the locate_genomes() function). Given the strong possibility for user error, in regards to the input formatting for GT_Pro build, it would be helpful to include more instructive error messages to help the user understand what is expected.

nick-youngblut commented 1 year ago

In regards to input formatting, it would be helpful to clearly point out in the instructions that compressed genome fasta files are not allowed as input for either maast or GT_Pro build... or just allow for at least gzip'ed input fasta files.

nick-youngblut commented 1 year ago

Another example where the assertion error is not very helpful:

Traceback (most recent call last):
  File "/opt/gt-pro/scripts/build_db.py", line 402, in <module>
    main()
  File "/opt/gt-pro/scripts/build_db.py", line 388, in main
    path_objs = validate_input_paths(input_array)
  File "/opt/gt-pro/scripts/build_db.py", line 112, in validate_input_paths
    assert os.path.isfile(path_map['species_dir'] + vtarget)
AssertionError

It would be helpful to include something like Could not find 'msa.fna'. Why must the tag_msa.fa generated from MAAST be renamed as msa.fna? Both the basename & extension must be renamed.