legumeinfo / jira-issues

placeholder repo for issues migrating from JIRA system, to be moved to their appropriate places later
0 stars 0 forks source link

reload gene trees including arachis #133

Closed adf-ncgr closed 10 years ago

adf-ncgr commented 10 years ago

this is actually well underway. we now have some examples of what the tree viewer will do with trees of < 4 nodes. See for examples,
3 nodes at http://lis-adf/node/227614
2 nodes at http://lis-adf/node/227623
1 node at http://lis-adf/node/227617

I think I can live with the somewhat silly aspect of the last, and the others seem to behave reasonably intelligently for what they are. Unless there are other opinions, we'll move ahead.

We may need to recompute the AHRD descriptors, or at least compute them for the families previously unrepresented.

[LEGUME-165] created by adf_ncgr

adf-ncgr commented 10 years ago

I'll trust your opinion. I don't think I can see those nodes on our servers, e.g.
http://lis-stage.agron.iastate.edu/node/227614 or http://lis-dev.agron.iastate.edu/node/227614

by scannon

adf-ncgr commented 10 years ago

Also - will the gene family entry page be at /chado/phylotree ?

by scannon

adf-ncgr commented 10 years ago

gene trees now available from lis-dev, but still working through some issues, including:
1) syncing of the arachis features (has been running for hours, with no obvious sign of progress)
2) syncing of the arachis organisms (hopefully will be quick, once the feature-syncing is out of the queue)
3) descriptors for trees that are new relative to the last round (only about 700 of these, I have the blast/interpro analysis going in prep for AHRD)
4) updating the db behind the django website to enable context views for the arachis species (waiting on this until the descriptors are computed)

But, it should be possible to preview the trees now. Here are examples of:
a one-leaved tree:
http://lis-dev/chado_phylotree/phytozome_10.54728358
a two-leaved tree:
http://lis-dev/chado_phylotree/phytozome_10.54585131
a three-leaved tree:
http://lis-dev/chado_phylotree/phytozome_10.54749235

seems like some operations may be a bit sluggish, hopefully only due to the concurrent syncing. I'm hoping to see how things for the gene models look here before hitting the -stage servers. I'll let you know when the syncing is done (hopefully today, but at the rate it is going, I have my doubts!)

by adf_ncgr

adf-ncgr commented 10 years ago

AHRD output for the new consensus sequences is under ~adf/trees_public2/new_consensus_need_descriptors_ahrd_out.csv

Steven, if you could run your protocol for cleaning these up similarly to the last set, I would then incorporate them into the
tree descriptors in the database.

peanut gene annotation feature still syncing!?

by adf_ncgr

adf-ncgr commented 10 years ago

I've generated that file. It's on lis-stage.agron.iastate.edu at
/legumeinfo/genefamilies/trees_public2/new_consensus_need_descriptors_ahrd_out.slim

The commands are in the shell script ~/bin/clean_AHRD.sh at lis-stage.

by scannon

adf-ncgr commented 10 years ago

Thanks Steven-
I'll run with this for now, but note that with the earlier batch of consensus sequences, I had formulated the tree descriptors off a somewhat rawer file "gene_families_20140902.AHRD.clean.csv" from which I could grab some extra details. One could certainly argue that these details not needed, but probably best to be consistent (someday). When dust settles, let's try to figure out which we prefer and go with that.

Also, note there are two files in the location you sent that are exactly identical except for naming:
ls -l /legumeinfo/genefamilies/trees_public2/new_consensus_need_descriptors_ahrd_out.slim*
rw-rw-r- 1 scannon staff 109265 Oct 28 19:58 /legumeinfo/genefamilies/trees_public2/new_consensus_need_descriptors_ahrd_out.slim
rw-rw-r- 1 scannon staff 109265 Oct 28 20:03 /legumeinfo/genefamilies/trees_public2/new_consensus_need_descriptors_ahrd_out.slimlt

I've deleted the latter, assuming it to be a mistake of some kind. let me know if this action unwarranted!

by adf_ncgr

adf-ncgr commented 10 years ago

Andrew - I had forgotten that I also made a "lightly-cleaned" version (cleaning out the Medicago genomic positions and a few other things). The command was: perl pe 's/consn.//; s/\t| (.) | [HL]C | .\w+:\d+\d+ | \d+/\t$1/; s/PREDICTED: //'

In case you want to work from that lightly-cleaned version rather than the heavily scrubbed "slim" version, it is at
/legumeinfo/genefamilies/trees_public2/new_consensus_need_descriptors_ahrd_out.clean.csv

(And thanks for removing the redundant .slimlt file; I am sure that is the result of pasting the cleanup command, followed without space with my "lt" alias [alias for ls -ltr]).

by scannon

adf-ncgr commented 10 years ago

thanks! all should be consistent now...
hope to be able to get the arachis data displayed in the context view sometime soon (waiting on a permission issue to be resolved)

by adf_ncgr

adf-ncgr commented 10 years ago

I think we're ready with the trees on lis-stage and the context views feeding off a copy of this same database.

by adf_ncgr