Closed zachary-foster closed 7 years ago
Okay, let me know if I can help
Ok, thanks!
Hi @sckott. So this is finally done. Sorry it took so long. Everything is working as far as I can tell and all the new functions have examples that you can test out. There are no tests yet and I have not perfected the documentation since I wanted your input before investing more time in the details, but there is hopefully enough there for you to get what I am going for.
I know there is a lot of changes in this branch. Normally I would make change more incrementally, but most of the added functions are part of a theme or depend on each other. Most of this seems intuitive to me, but I have been thinking about it for a while so its hard for me to gauge how easily others will understand it. Feel free to suggest or make changes, even ones that fundamentally alter or remove things.
The taxmap.R
is huge now and needs to be split up, but im not sure how best to do that.
This is the last part of metacoder I plan on migrating to taxa, so taxa can be developed in a more regular incremental way now since most of the building blocks are in place. I have lots of ideas for minor improvements that I would like your input on, but too many to discuss in one issue. If you like the overall structure of what I did so far, I will make new issues for each idea independently.
No worries about the time.
Great news. Will have a look through the pull request, and see if anything we should change.
The
taxmap.R
is huge now and needs to be split up, but im not sure how best to do that.
taking a look at it now, first thing to do is to split out helper functions in that file into other files as makes sense.
I suppose you're also or primarily referring to the Taxmap
class - which yeah is pretty big. A few thoughts on that:
Taxmap
class that could be pulled out and put elsewhere since they aren't using self
referencescould use inheritance in R6 ...
Perhaps. A few of the functions that only deal with taxon relationships could be reinstalled in the taxonomy
class, but some would have to be partially overwritten by taxmap
since they have options that affect the data
variable in taxmap
, even if that is not their main effect (e.g. filter_taxa
). Im not sure if there is much room conceptually to put a class in between taxonomy
and taxmap
, but I am open to ideas.
i see a number of helper functions inside the Taxmap class that could be pulled out and put elsewhere ...
Yea, I think this should be done. There are also some similar chunks of code shared by multiple functions that could be made into helper functions I think.
for R6 inheritance, i've no idea if that even makes sense in this case, its just the only thing i can think of to allow breaking up Taxmap
into smaller parts
Yea, I think this should be done.
cool, always a fan of DRYing out code
Seems like this issue is done, right? can I close?
If you like the overall structure of what I did so far, I will make new issues for each idea independently.
sounds good to open new issues
Yea, it can be closed
thanks
This will be an upgrade of
taxmap
, but withtaxonomy
already doing a lot of the heavy lifting, I think it will actually reduce the complexity oftaxmap
some. The basic idea is to maketaxmap
inherittaxonomy
, but add a list of user-defined tables. When these tables have ataxon_id
column, that column will be used to map rows to the taxa and the edgelist, so modifications of the edgelist/taxa can affect the content of the tables. For example, removing a taxon might remove all of rows corresponding to that taxon as well as (optionally) all of its subtaxa. These kinds of operations will be done with a set of functions modeled after thedplyr
functions, similar tofilter_taxa
andfilter_obs
now (see?filter_taxa
for details).filter_taxa
will work pretty much the same, butfilter_obs
will need to be reworked to use multiple user-defined tables instead of just one "observation" table.The metacoder
taxmap
also had a list of user-definable functions that could be added that simulate columns of data, but were calculated every time they were referenced. I find these very useful, but im not sure yet how to adapt them to multiple tables.taxmap
classsubtaxa
,roots
, etc work withtaxmap
taxmap
print methodobs
(see?obs
) that can be used with multiple tables.filter_taxa
. Thetaxonless
andreassign_obs
options will need to be adapted to multiple tables. I am thinking of allowing for a named logical vector as well as justTRUE
/FALSE
. For examplereassign_obs
could be justTRUE
to reassign observations in all user-defined tables or something likec(abundance = TRUE, stats = FALSE)
to reassign observations in the table called "abundance", but not in the table called "stats". Another thing to consider is that the oldtaxmap
had a table dedicated to taxon statistics (one row per taxon), so those columns were most commonly used as filtering conditions. Without a dedicated table that is guaranteed to line up with the edgelist 1 to 1, only user-defined tables with one row per taxon will work as filtering conditions. The reworkedobs
could be used to be used to consolidate data from tables with any number of rows per taxon to be used for filtering.filter_obs
,select_obs
,arrange_obs
,mutate_obs
, andtansmute_obs
and possibly rename. This will require adding a new argument specifying which table to manipulate, but not too much work.mutate_taxa
,select_taxa
, andtransmute_taxa
since there will no longer be a dedicated taxon statistics table. This functionality wil be replaced by the*_obs
functions above.arrange_taxa
to only effect the order of the edgelist.sample_n_obs
,sample_n_taxa
. These changes will be similar to others above.