Closed meren closed 5 years ago
I'm putting a note here that we didn't discuss storing states in the profile database.
We need to make sure that we can save states for --gene-view
, and that these states are not confused with the full mode view.
We worked on this today and committed all changes to gene-view
branch.
So far:
We created load_gene_view
, in which we:
gene_level_coverage_stats_dict
gene_level_coverage_stats_dict
gene_level_coverage_stats_dict
for the genes in the splits_of_interest
(according to the collection and bin that was provided by the user), and use this to populate the dict
in self.views
@meren, we hit a roadblock when trying to create the newick for ordering.
The way ordering is created for other interactive modes is by referring to a table in the database, and then clustering is computed , but for --gene-view
we don't have a table in the database, so we need to use the dictionary. So right now the solution seems a little ugly, we will use the dict
that we create in self.views[view]['dict']
, convert it into a pandas
dataframe, then we will create a class in clustering.py
that takes a dataframe and returns a newick tree. @meren, if you maybe see a different solution, please let us know.
There is already a solution for that. Please see the source code of anvi-matrix-to-newick
:)
Best,
Got it. @ozcan, I'm putting a not for us here, it seems that once we create a pandas dataframe (or simply a numpy matrix) then we run the following lines (from clusteting.get_newick_tree_data
):
vectors = get_normalized_vectors(vectors, norm=norm, progress=progress)
tree = get_clustering_as_tree(vectors, linkage, distance, progress)
newick = get_tree_object_in_newick(tree, id_to_sample_dict)
It seems to me that we should actually change clusteting.get_newick_tree_data
so instead of accepting a path to a matrix, it would accept a matrix, and then have a wrapper for it for the case of anvi-matrix-to-newick
.
Something to consider: anvi-mcg-classifier
computes a table of the presence/absence of genes in samples, I think that this is something that people would definitely want to use in gene views. This means that we need to add it as an available view, and I also think we should add an ordination according to this information. My suggestion is to add a flag --skip-mcg-classification
, and that unless it is provoked then mcgclassifier
would run (without producing plots), and would add this gene presence/absence information in an ad-hoc manner. @ozcan, @meren, let me know what you think.
This is very doable. Generating additional views, and newick item orders based on views is something we do in both pangenomic and metagenomic workflows, so there are tools to do that in place :) We can take a look at this together if you guys like.
Update on the current state: Bottom line: we reached a road-block that probably requires Meren's help.
We have all the view data and the items orders created in the proper format (including the newick trees).
The problem we have is because we have a contigs database, then the Interactive
class is kind of treating our gene_mode
in a similar manner to full
and refine
modes. As an example to the source of the problem see these lines:
https://github.com/merenlab/anvio/blob/7826b132c682a906959759fda7e236098ec439af/anvio/interactive.py#L971
Because we initialize the contigs database, we can expect both self.genes_in_splits_summary_dict
to exist and also self.splits_taxonomy_dict
(if there is taxonomy), even though we don't want to show split summaries nor taxonomy in gene_mode
.
@meren, if you want to see the exact status, just go into the test-output
folder, and run:
anvi-interactive -c CONTIGS.db -p SAMPLES-MERGED/PROFILE.db --gene-view -C CONCOCT -b Bin_1
I want to look into this, but currently the gene-view
can't run the mini test due to this error:
Traceback (most recent call last):
File "/Users/meren/github/anvio/bin/anvi-merge", line 50, in <module>
merger.MultipleRuns(args).merge()
File "/Users/meren/github/anvio/anvio/merger.py", line 404, in merge
self.populate_layers_additional_data_and_layer_orders()
File "/Users/meren/github/anvio/anvio/merger.py", line 482, in populate_layers_additional_data_and_layer_orders
TableForLayerOrders(args).add(layer_orders_data_dict)
File "/Users/meren/github/anvio/anvio/tables/miscdata.py", line 339, in add
TableForLayerOrders.check_names(self, data_dict)
File "/Users/meren/github/anvio/anvio/tables/miscdata.py", line 629, in check_names
data_dict[data_key]['order_type'],
KeyError: 'order_type'
This error is due to changes in clustering.py
. with latest changes, it returns the clustering of items, rather than layers, even when the transpose parameter is sent.
(going back to master
, running mini test, and coming back to gene-view
branch works).
we don't want to show split summaries nor taxonomy in
gene_mode
.
The best way to achieve this is to reset those dictionaries (until we find a smart way to populate them with other information) in gene_mode
. see the workaround in 613e50f.
@ozcan, I played with the gene-view
today. And I realize that we need to address what will happen when the user hits "next in inspect context
. Right now it goes to the next gene, which means that it is very likely to be in the same split, and then the page is reloaded just to show up with the same information.
This is what I suggest: A simple solution could be that it would continue to go to the next gene, but if the next gene is in the same split then the page will not be reloaded, and the only thing that would happen is that the marker showing which gene is inspected would move to the next gene (we said that there would be a color showing which gene in the split is that one that was inspected, so simply the color would move to the next gene). @meren, @ozcan, let me know what you think about this solution.
In inspect gene
, the solution should be simple and the page would just reload with the information relevant to the next gene.
In both cases we want the "next gene" to be the one following according to the currently chosen "items order".
We forgot to talk about this:
Only after everything works completely we may consider adding all views that are available through the atomic data table for single profiles (such as variability view, max-normalized-ratio view, etc).
(this is copied from the views section of the top comment in this page).
Specifically I think it would be nice to allow to visualize variability of genes, and from what I remember I think @ekiefl built the infrastructure that would allow that.
@ozcan , things that need to be done:
(anvio-python-3) ➜ INFANT-GUT-TUTORIAL anvi-interactive -p PROFILE.db -c CONTIGS.db --gene-view
Contigs DB ...................................: Initialized: CONTIGS.db (v. 10)
Interactive mode .............................: gene
Auxiliary Data ...............................: Found: AUXILIARY-DATA.db (v. 2)
Profile Super ................................: Initialized with all 4784 splits: PROFILE.db (v. 23)
Config Error: There is no "None" I know of. Probably something is spelled wrong somewhere? In
case you are a programmer and accessing to the collections from your program,
here is a reminder for you: are you sure `populate_collections_dict` was called
for whatever database you are trying to get collections from? If you are a user,
you can always try to use the `--list-collections` flag and hope for the best.
giving no parameters gives this error after spending a lot of time recovering gene level coverage dict
Ozcan, gene mode should only work when given a collection and a bin. I guess we forgot to put a basic sanity check to make sure that a collection and bin were specified.
Yes, also whatever function is looking for that "None" should complain much earlier in its context about the fact that it is None :)
@ozcan I added this sanity check: https://github.com/merenlab/anvio/commit/f0cd5d6512f5fcd3273dadc59a8e9f1e3cd20842
I'm now using gene view for my real data (oral metagenomes) and I see it is quite annoying to wait for the gene level coverage stats to be computed. I think we should provide a way for this data to be written to the disk so that if a user is repeatedly using gene view, they don't have to wait every time from this to be computed. @meren, @ozcan, what do you think about this? And if you agree what would be the best way to do this?
I think the best way to do it is to implement a --gene-mode
flag into anvi-split
, so you generate a self-contained profile for genes.
That makes perfect sense to Meren. So this will wait to the next version I guess.
I think it should. Unless we are ahead of our schedule and have a full day to work on this.
The design is quite streamlined in anvi-split, and it may be easy to extend it with genes without breaking things. But we need to look at the code to see whether this is the optimist in me.
I wanted to close this issue, but then I saw that the first comment on this pages includes the following things:
Future "gene-mode" for anvi-split We envision that there may be a “gene-mode” also for the program anvi-split in the future. This would create a gene profile database for a given bin. The gene profile database would be familiar with the parent contigs and profile databases (through their hash). This would allow us to use all anvi'o functionality (including creating collections and anvi-summarize tasks) on gene profile databases. When the parent databases are provided, it would then be possible to inspect a gene in the context of its parent contig.
searching for functions in genes Allow to search for functions in "gene view".
Gene caller Genes would be computed using the results from the default gene caller, but anvi-interactive will require a --gene-caller flag if the user wishes to use a different gene caller for gene view. This will wait for Meren to finish redesigning that is occurring in this branch: https://github.com/merenlab/anvio/tree/gene_caller_option
The first thing has been completed. @ozcan, are you interested in completing the second thing (function search)? @meren, what about the gene-caller remark? It refers to a branch that hasn't been touched in 10 months (https://github.com/merenlab/anvio/tree/gene_caller_option)
The function search already works I think. Is it not? In which case there is nothing to do here for @ozcan at this time :)
We can close it. Gene caller option will require some design decisions I will make with @ozcan at some point within the next 3-6 months.
This is the summary of our meeting today (@ozcan, @ShaiberAlon, @meren) during which we concluded that anvi'o needs a native 'gene view' mode, where every split is a gene with all perks of anvi-interactive (except binning; see below). The following is mostly Alon's notes from the meeting, which contain some of the implementation details and plans for the future.
Target
This mode will require a collection name and a bin name.
Views
Only views that will be available in the interface are going to be those in
gene_coverge_stats_dictionary
(which is recovered throughanvio/dbops
). Only after everything works completely we may consider adding all views that are available through the atomic data table for single profiles (such as variability view, max-normalized-ratio view, etc).Ordering/clustering
Gene view would have the clusterings calculated on-the-fly (similarly to the strategy implemented in
refine_mode
inanvio/interactive
). The available item orders would be:Interactive right-click options
Sequence
When right-clicking the same options that are available for contigs, will be available for the gene.
Inspection
In gene view mode there will be to inspect options in the right-click menu:
• “inspect global” - opens the split (to which the gene belongs) with zoom-in on the gene of interest. • “inspect local” - splices out the gene start/stop as the only context to display coverages (+ 100 nucleotides in each direction). This would allow much faster loading of the inspection page when there are many samples.
Collections
No selections would be available, and no collections would be available at this time.
Future
"gene-mode" for
anvi-split
We envision that there may be a “gene-mode” also for the program
anvi-split
in the future. This would create a gene profile database for a given bin. The gene profile database would be familiar with the parent contigs and profile databases (through their hash). This would allow us to use all anvi'o functionality (including creating collections andanvi-summarize
tasks) on gene profile databases. When the parent databases are provided, it would then be possible to inspect a gene in the context of its parent contig.searching for functions in genes
Allow to search for functions in "gene view".
Gene caller
Genes would be computed using the results from the default gene caller, but
anvi-interactive
will require a--gene-caller
flag if the user wishes to use a different gene caller for gene view. This will wait for Meren to finish redesigning that is occurring in this branch: https://github.com/merenlab/anvio/tree/gene_caller_option