Open gvanzin opened 7 years ago
Thanks, Gary, this is a great feature request.
Description is a useful attribute to add. There are various ways you might be able to do this using generic R methods. However, I don't think a plain text description would solve the harder problem you've posed.
Some "good practice" helps a lot, especially reproducible Rmd, and the "pipes" semantic in the magrittr package can help with avoiding a lot of intermediate-but-unnecessary objects that add to confusion.
I'm have some ideas for more formally defining workflows using object-oriented classes, and I'm not the only one. Storing all intermediate versions of specific tables could be useful, for instance. There are ways to do this with active bindings.
If the goal is provenance tracking and cleaner code, then reproducible (e.g. Rmd) good practice might already get you there, and perhaps more clearly than, say, some complicated R object with embedded history.
Thoughts?
Thanks for the quick reply. My main thought is that my personal struggles with keeping a good lab notebook have spilled over into my bioinformatic analyses! I have phyloseq RDS files from 6 months ago on my computer which take significant time to de-convolute their genesis. I don't trust that I didn't add or skip a step in my r notebook workflows to take them at face value. I hear your 'good practice' advice and will work on that. In addition I do a lot of "experimenting" with microbiome analyses, testing out different aligners, reference databases, normalization techniques, etc. It's wicked cool that I can seamlessly use the SINA aligner through a Bash code chunk seamlessly integrated into r studio. So for one one project I might have 20 phyloseq objects. If I had an optimized workflow I would use pipes, but for now I do a detailed inspection of each intermediate step, trying to see if what I did was smart. I actually look at my alignments! Maybe I'm naive in wanting the ability to type:
describe(ps)<-"This is what i did to get here."
I wish the r studio environment pane had a use defined 'description' column...that would help a lot. Sorry if I'm just complicating things!
Gary
Would it be possible to define the phyloseq assignment operator to update the list of all previous assignments used on the object? If there are other functions that mutate the phylsoeq object in place they would ahve to do this too.
Basically, assignment to a phyloseq object would store the command string in a $history list, and appending the current command line to the $history list from the phyloseq object (if there is one) on the "right side" of the assignment. The goal being that in the end you would have the list of assignments/mutation commands that formed the object. Not fully reproducible by any means, but a sort of useful "lab notebook" record.
(I'm already seeing a complication from merging multiple phyloseq objects)
The first step I am afraid is to always save your history when you quit R. phyloseq objects are not the only things you want to keep track of and the history files have saved me many times, if you don't use the Rmd at least you have a complete record of what you did.
On Fri, Aug 11, 2017 at 7:08 AM, Gary Vanzin notifications@github.com wrote:
Thanks for the quick reply. My main thought is that my personal struggles with keeping a good lab notebook have spilled over into my bioinformatic analyses! I have phyloseq RDS files from 6 months ago on my computer which take significant time to de-convolute their genesis. I don't trust that I didn't add or skip a step in my r notebook workflows to take them at face value. I hear your 'good practice' advice and will work on that. In addition I do a lot of "experimenting" with microbiome analyses, testing out different aligners, reference databases, normalization techniques, etc. It's wicked cool that I can seamlessly use the SINA aligner through a Bash code chunk seamlessly integrated into r studio. So for one one project I might have 20 phyloseq objects. If I had an optimized workflow I would use pipes, but for now I do a detailed inspection of each intermediate step, trying to see if what I did was smart. I actually look at my alignments! Maybe I'm naive in wanting the ability to type:
describe(ps)<-"This is what i did to get here."
I wish the r studio environment pane had a use defined 'description' column...that would help a lot. Sorry if I'm just complicating things!
Gary
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/808#issuecomment-321822534, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvUPAz4rdnCkpzTFTbvYcK4WUWATSks5sXGBCgaJpZM4OztmF .
-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/
Thanks for creating such wonderful, user-friendly software! My issue is keeping track of how I iteratively change a phyloseq object. If you have a great way to do this I'm all ears, but one solution would be to have a short "description" field that pops up when you call the phyloseq object, just like otu_table(), tax_table(), etc. What I do now is create a standalone dataframe to track what I've done, I attached an example. There must be a better way! Thanks again,
Gary
phyloseq_description_example.Rmd.zip