Open mlbendall opened 7 years ago
The Santiago team agrees with this 100%!
I'd like to add that while pathoID writes a sorted .tsv file, it's sorted by Final Guess and sometimes you want it sorted by Final Read Numbers. If we read the full pathoID output without any cutoffs, then in phyloseq you can easily get the top X by issuing something like:
top10 <- names(sort(taxa_sums(physeq), TRUE)[1:10])
Someone may want to define the top X by proportions instead of counts, in which case a transformation is needed:
physeq <- transform_sample_counts(physeq, function(x) x / sum(x) )
Regarding point 3, I think users should be warned to upload unfiltered results only, and let pathoStats decide when it's appropriate to filter.
BTW, @mlosada323 mentioned rarefaction for 16S data. That's also a oneliner in phyloseq:
physeq_rare<-rarefy_even_depth(physeq, sample.size =1000,replace=FALSE, rngseed=T);physeq_rare
Cheers,
Eduardo
PS: The alluvial plot is almost done! @Sanrrone
Wow looks nice @Sanrrone!
Can you make a remote branch and push up what you have currently? I'd like to look at how you are getting the sample condition.
Im confusing about how remote branch works, I did make a pull request, is the same?, wherever, you can looks the change in my fork: https://github.com/Sanrrone/PathoStat
Oh, didn't know you were working on a fork.
Remote branch is in the same repository, while fork creates a new repository. There is currently debate about when to branch or fork, but it boils down to how closely you are involved with the original project and whether your changes will eventually be incorporated into the original project.
Just make sure to keep your fork in sync with master, and (ideally) merge the upstream master and test your code before making a pull request. Same goes for branches.
For those not at BU, we had a conversation today about how different analyses have different filtering requirements for the data. For example, you should not filter low-abundance OTUs for alpha diversity calculations, but there are other situations where you might want to filter for analysis or visualization. So we concluded:
There are other details that need to be sorted out, such as how to track if users upload pre-filtered data, etc.