There is no information about the scalability and time complexity of the methods to manipulate the data...These become important as the data scales up so should be discussed.
It would be nice to have some demonstrations of this. In my experience, parsing data is the weakest link currently. Its slow, but I have been able to parse whole reference database taxonomies for single kingdoms (RDP, Greengenes, and SILVA at least). I can think of the following demonstrations:
[ ] Time how long it takes to parse the whole taxonomy for a large database. @sckott, do you know if we can easily get a text-based dump of the NCBI taxonomy? That would be pretty big and have some name-recognition.
[ ] Look at RAM usage of the resulting object and compare it to raw string size if it is read in as text.
[ ] Time how long filter_taxa and filter_obs take to filter this object.
We had this comment from a reviewer:
It would be nice to have some demonstrations of this. In my experience, parsing data is the weakest link currently. Its slow, but I have been able to parse whole reference database taxonomies for single kingdoms (RDP, Greengenes, and SILVA at least). I can think of the following demonstrations:
filter_taxa
andfilter_obs
take to filter this object.