wutron / dlcpar

Modeling gene duplication, loss, and coalescence (through parsimony)
GNU General Public License v3.0
7 stars 5 forks source link

Output path problem #2

Closed davidemms closed 7 years ago

davidemms commented 7 years ago

Hi

I intend to use the release version on your website (https://www.cs.hmc.edu/~yjw/software/dlcpar/) as you suggested. However, I was just trying the github version and I encountered a problem where dlcpar seems to be trying to write to a file path that doesn't exist.

I've attached a test case, if I run the following command inside the attached directory using the command:

dlcpar_search -s Trees_ids/SpeciesTree_ids_0_rooted.txt -S Trees_ids/GeneMap.smap -D 1 -C 0.125 Trees_ids/OG0000000_tree_id.txt -O dlcpar/OG0000000_tree_id

Then I get the following error:

Traceback (most recent call last): File "/usr/local/bin/dlcpar_search", line 209, in sys.exit(main()) File "/usr/local/bin/dlcpar_search", line 201, in main phyloDLC.write_dlcoal_recon(out, coal_tree, maxrecon) File "/usr/local/lib/python2.7/dist-packages/dlcpar/deps/compbio/phyloDLC.py", line 350, in write_dlcoal_recon recon.write(filename, coal_tree, exts, filenames, filestreams) File "/usr/local/lib/python2.7/dist-packages/dlcpar/deps/compbio/phyloDLC.py", line 178, in write rootData=True) File "/usr/local/lib/python2.7/dist-packages/dlcpar/deps/rasmus/treelib.py", line 602, in write_newick write_newick(self, util.open_stream(out, "w"), File "/usr/local/lib/python2.7/dist-packages/dlcpar/deps/rasmus/util.py", line 1171, in open_stream stream = open(filename, mode) IOError: [Errno 2] No such file or directory: 'Trees_ids/OG0000000_tree_id.txtdlcpar/OG0000000_tree_id.coal.tree'

I.e. it seems to be concatenating the paths incorrectly. This ran fine for me with version 0.9.1.

Thanks for your help David dlcpar_output_error.tar.gz

davidemms commented 7 years ago

In fact, I also get this problem with the release version on your website...

wutron commented 7 years ago

Hi David,

Sorry, the flags for dlcpar might be non-intuitive. dlcpar assumes that the gene tree filename has the format <path>/<basename><inputext> and will output to filename <path>/<basename><outputext>{.tree,.recon,.order}. dlcpar_search is the same except the output extensions are in three-tree format (.coal.tree, .coal.recon, .locus.tree, .locus.recon, .daughters). By default inputext = "" and outputext = ".dlcpar". So e.g. paper.txt would write to paper.dlcpar{.tree,.recon,.order} or the analogous for dlcpar_search.

Sorry that there is no way to write to a different path. For your example, I would suggest copying over the file first to your dlcpar directory, then running dlcpar_search -s Trees_ids/SpeciesTree_ids_0_rooted.txt -S Trees_ids/GeneMap.smap -D 1 -C 0.125 dlcpar/OG0000000_tree_id.txt -I .txt which would output to dlcpar/OG0000000_tree_id{,.coal.tree,.coal.recon,.locus.tree,.locus.recon,.daughters} You can then analyze *.locus.tree for events.

Also, if your species tree is "small" (say <= 20 species), I would recommend that you use dlcpar over dlcpar_search where possible. dlcpar will fail if the gene family is too large but seemed to work in the majority of cases in our analysis. dlcpar implements the algorithm of our paper. dlcpar_search looks for the MP solution but uses an iterative search process rather than search over the entire space.

Let me know if that works.

Thanks,

davidemms commented 7 years ago

Thanks, that explains what was happening! I will use your suggestion, I assume that command line arguments will continue to work in any future versions?

I'm finding dlcpar very useful and am using it in some software called OrthoFinder to identify orthologues within gene families and I also find it useful for identifying gene duplications. This is new functionality in OrthoFinder so we will cite DLCpar in the next OrthoFinder paper.

I will look at your suggestion of using 'dlcpar' instead of 'dlcpar_search' and investigate in what cases it is successful. The only problem is that I need the analysis to be completely automatic so it would be a problem if it failed unexpectedly. I'll see if there's a way I can deal with this.

All the best David

wutron commented 7 years ago

Hi David,

Yes, I am not sure how your original command worked in v0.9.1 actually. Sorry for the trouble.

Glad to hear that you are finding dlcpar useful for OrthoFinder. If you do use dlcpar_search, you might consider increasing the number of search iterations (e.g. -i 1000 --nprescreen 100); the default (-i 10 --nprescreen 20) will probably not search enough of the space for all but the smallest families.

Yes, I see the problem with using dlcpar then. My scripts run dlcpar for a fixed time, and if it fails (the script always creates a file, so I check for an empty file), then runs dlcpar_search, which is guaranteed to return a solution. In any case, even if you stick with dlcpar_search, it should still do better than LCA by virtue of handling ILS.

Thanks,