xiezhq / ISEScan

A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
Apache License 2.0
79 stars 17 forks source link

Results in additional subdirectory #30

Closed oschwengers closed 3 years ago

oschwengers commented 3 years ago

Hi and thanks a lot for this excellent tool! I've been looking for something like this for a long time.

I have a couple of questions/proposals regarding the CLI.

When I am providing a genome via a relative path: isescan.py ../../test-data/GCF_000008865.2.fna prot hmm The result files of interest are stored in the nested subdirectories prediction/test-data/.

However, in an automatic high-throughput setup, I have the feeling, it's a little bit inconvenient to parse the parts (test-data) of the input file path and actually it should not be necessary. Is it always the parent directory of the input file? Would it be possible (for example) to provide an output parameter --output and all result files are stored within this directory? Example: isescan.py --output results --prefix GCF_000008865 GCF_000008865.2.fna would put all files with the prefix GCF_000008865 under results

By this, one could also discard the extra positional arguments for protein and hmm subdirectories as by default they could be placed within the output directory.

It's just a proposal to make it easier to integrate ISEScan into larger pipelines and I hope this makes sense. Best regards

xiezhq commented 3 years ago

oschwengers,

Thank you for the suggestion. The answer for your question 'Is it always the parent directory of the input file?' is Yes. The reason is that, I once used ISEScan to process thouthsands of genomes downloaded from NCBI and the parent directory of the genome sequence files (fastq files) is the the name of species. In order to conveniently organize all results into the correponding species directories, ISEScan assumes the parent directory of input file as the species name and keeps it when it created the output files. The creation of parent directory of input/out file in ISEScan is just for the legacy reason. The next version of ISEScan may remove it and adjust command line options as well based on your suggestions.

Xie

oschwengers commented 3 years ago

Hello @xiezhq , thanks for the rapid, informative and positive reply!

As I am integrating ISEScan into an in-house pipeline right now, could you provide some rough guidance (rather days/weeks/months), when you'll be able to release this next version with the amended CLI?

So maybe it might be worth waiting for the next release.

xiezhq commented 3 years ago

I don't know when I can finish upgrading ISEScan with new features as I can only maintain and upgrade ISEScan in my spare time at home.

xiezhq commented 3 years ago

Update, ISEScan v1.7.2.2 and later provide command options --seqfile and --output to specify the input file and output directory.