mani2012 / PathoStat

The purpose of this package is to perform Statistical Analysis on the PathoScope generated reports files.
8 stars 9 forks source link

alluvial plot hard code #21

Closed Sanrrone closed 7 years ago

Sanrrone commented 7 years ago

due to metada file, I wrote inside the server the code to make a phyloseq object (this must have a "Time" column). and I added the files in a new folder "dataT" in example folder.

PD: alluvial plot require a lot of dependences

ecastron commented 7 years ago

@mani2012 We needed to add a phyloseq object in order to make the prototype alluvial plot to work.

Merging this shouldn't break anything as it only adds some code and a new folder in /inst/example

After this we'll be working from a branch instead of a fork

mani2012 commented 7 years ago

@ecastron @Sanrrone Eduardo, Can you please let me handle all the pull and merge requests. The problem in the code is that it is not using the phyloseq object that is already generated by PathoStat through the findPhyseqData() function. We also needed the alluvial plot to be generically working with the data that the user has passed and not fixed to some demo example with fixed time variable selection. The time variable has to be user selected.

ecastron commented 7 years ago

@mani2012 Sounds good. The issue we had is that the phyloseq object that is already generated by pathoStats doesn't have a time factor. We can certainly make the time factor user selected but we'll need you to add an extra column with a time factor to the phyloseq object generated by findPhyseqData()

mani2012 commented 7 years ago

@ecastron @Sanrrone Yes, that is one of the updates that I am currently making. Currently the sample_data contains only condition and batch. But you should think of this sample_data as containing multiple factors and one of the factors can be time. For the time being, may be the batch factor can be considered as time. But also, we may also have to think of how to let the user associate time for each subject etc,. But, whatever simplicity assumption we make, I want this analysis to be done with the user provided data with may be no plot shown initially and the alluvial plot displayed after the user selects the time variable. BTW, this alluvial plot is looking really great. It is just that I don't want this tab to be on a particular example but based on the user provided data and inputs. Could you make these updates and send me a new pull request?

mani2012 commented 7 years ago

@ecastron @Sanrrone @mlbendall On the topic of fork vs branch, I think fork is fine, as I don't want to be dealing with multiple branches at this time. But as Matthew mentioned, you need to make sure that your fork is in sync with the master.

ecastron commented 7 years ago

Hi @mani2012

Absolutely, we usually work with datasets with 30+ columns of metadata! I understand what you say about making the tab general and not with some fixed example. We did it that way as an example/placeholder while you guys updated the sample_data() from the physeq object.

We can try using the batch condition as time but my guess is that the alluvial package is going to complain since it likes numeric only levels.

I think the association of subject and time is fairly straightforward because the basic unit in the phyloseq object is "sample". If a subject has been sampled over time, then it'll have as many samples.

Regarding users selecting time factors, we are adding a dropdown menu option in the sidebar.

Cheers,

Eduardo

mani2012 commented 7 years ago

I guess, currently batch can be input as numeric only levels also by the user. I guess the the association of subject and time is fairly straightforward, but still we need to develop a nice ui for the user to be able to do so, right? Yes, the drop down for time factor selection is good.

ecastron commented 7 years ago

Agreed, a nice UI attracts users