Open reedacartwright opened 6 years ago
Why do we think the +1 was there? must have been some reason... @reedacartwright Please check if the annotation function works - gtcall file should have nodes numbered as a list of descendant leaves (based on vcf file); tree should be numbered w leaves based on vcf file and nodes numbered based on children so those should match now
I imagine they wanted to avoid zeros for some reason.
Things I spotted.
What if utils::init_tree
was modified to do the sample names->id conversion? It could then create a node.samples
variable to keep track of names alongside ids. That may make it easier to write the genotyping and annotation functions.
If this works I'd rather not bother with gtcall using names. That's what I did initially but then it requires converting back. I'm not sure what you're asking with the command position arguments init_tree is used elsewhere without having real sample names attached (ie for random starting trees) so while it would be nice not to copy-paste the conversion and go back and forth that should really be a separate function. I don't feel like bothering right now - mostly just want it to work.
As a user of treecall, I'm now spotting things that are barriers for users. I'd fix some of these myself, but my python foo is weak.
-t FILE
, which one would expect based on the other two commands? Why is the output file not at the end like it is in the other two?Traversed tree incorrectly - should have tree output with sample names now Added a second gtcall file with leaf names - lame but easier to copy-paste than convert Does the usage look better?
The usage looks better. Any reason why the 'output' for nbjoin does not have angled brackets?
I also noticed (via R) that the header for the gtcall is shorter than the body, which I think means a column label is missing.
Creating a second output file, just adds to the confusion, because then the user doesn't know which one to submit to the annotation call. We need to fix annotation to use sample names instead of sample ids.
The way to do that is to to put something like this after the init_tree
function to convert the sid vectors back into sample name vectors.
https://github.com/rachelss/treecall/blob/c99aa6f892d90d254eacafea3ab54061955abc1b/geno.py#L182
But it might be simpler to not send to use init_tree which forces the usage of sample ids which is only needed if calculating probabilities.
Does it work as-is? We can fix aesthetics, but I would like to know what the results are
Yes.
I'm now trying to figure if the v2 trees are an improvement on the Mouse data compared to the v1 trees.
Below are some places to change to improve the usability of Treecall.
names
part so that the file extension is just.tree
or.tre
if you prefer that. https://github.com/rachelss/treecall/blob/38778cc1001454a37b5087f05efa1e2ebfc3468a/tree_est.py#L105format=9
when converting to Newick. That way users don't think that the lengths matter. https://github.com/rachelss/treecall/blob/38778cc1001454a37b5087f05efa1e2ebfc3468a/tree_est.py#L105.best.tree
file. https://github.com/rachelss/treecall/blob/38778cc1001454a37b5087f05efa1e2ebfc3468a/tree_est.py#L110https://github.com/rachelss/treecall/blob/38778cc1001454a37b5087f05efa1e2ebfc3468a/treecall.py#L188-L198
+1
https://github.com/rachelss/treecall/blob/38778cc1001454a37b5087f05efa1e2ebfc3468a/treecall.py#L197Additional change