Closed julian-ramos closed 8 years ago
Actually before the second step, organize the data in the next manner, this is just for graphical purposes, put all the morning then all the afternoon and then all the evening. This should not affect at all the clustering part.
Once you are done with all this there is another task that will be interesting to do. Can you go back to the session data and compute a transition matrix for each category. This means that you count every time you observe the transition from one app to the other. We do this for every device_id. Finally we obtain a transition matrix notice it may not be symmetric. Once we obtain that then the challenge again is trying to find whether there is a pattern in the transition probabilities. For all this task we should be using the merged categorization instead of the full one.
@julian-ramos Ok, I read all this information, and I think I know what I should do. But, I didnt make sense of 'next manner', I will search it first :)
Hi Roy,
I just added you on skype. Do you want to talk? On Nov 18, 2014 8:31 PM, "Roy Xue" notifications@github.com wrote:
@julian-ramos https://github.com/julian-ramos Ok, I read all this information, and I think I know what I should do. But, I didnt make sense of 'next manner', I will search it first :)
— Reply to this email directly or view it on GitHub https://github.com/RoyXue/MobileAnalysis/issues/2#issuecomment-63578100.
Im online now, ok, if it is convenient for you let have a quick talk :)
Best Regards — Xue Lijun Roy http://www.royxue.me/
On November 19, 2014 at 9:33:57 AM, Julian Ramos (notifications@github.com) wrote:
Hi Roy,
I just added you on skype. Do you want to talk? On Nov 18, 2014 8:31 PM, "Roy Xue" notifications@github.com wrote:
@julian-ramos https://github.com/julian-ramos Ok, I read all this information, and I think I know what I should do. But, I didnt make sense of 'next manner', I will search it first :)
— Reply to this email directly or view it on GitHub https://github.com/RoyXue/MobileAnalysis/issues/2#issuecomment-63578100.
— Reply to this email directly or view it on GitHub.
@julian-ramos Hi, Julian In tables folder, there are new result table In trans matrix folder, there are result of the app category transfer matrix.
For the Clustering part, I think I can finish it in 2 days.
Hi Roy,
Great, I will go through all of them later in the week.
Thanks,
On Sun, Nov 23, 2014 at 10:22 PM, Roy Xue notifications@github.com wrote:
@julian-ramos https://github.com/julian-ramos Hi, Julian In tables folder, there are new result table In trans matrix folder, there are result of the app category transfer matrix.
For the Clustering part, I think I can finish it in 2 days.
— Reply to this email directly or view it on GitHub https://github.com/RoyXue/MobileAnalysis/issues/2#issuecomment-64148866.
Julian.
Actually, one question: How did you merge the similar categories?
I was going through the code of trans_matrix.py and I see there you didn't use all of the categories just a subset of them.
On Mon, Nov 24, 2014 at 10:07 AM, Julian R. ing.julianr@gmail.com wrote:
Hi Roy,
Great, I will go through all of them later in the week.
Thanks,
On Sun, Nov 23, 2014 at 10:22 PM, Roy Xue notifications@github.com wrote:
@julian-ramos https://github.com/julian-ramos Hi, Julian In tables folder, there are new result table In trans matrix folder, there are result of the app category transfer matrix.
For the Clustering part, I think I can finish it in 2 days.
— Reply to this email directly or view it on GitHub https://github.com/RoyXue/MobileAnalysis/issues/2#issuecomment-64148866 .
Julian.
Julian.
Hi Roy,
I was going through your merge_category folder and checking on your plan for merging categories and I mostly agree with all of them however. Can you include a short description on why are you merging the way you are for each of the merges proposed there?
For example, I would like to know why you want to put together finance and business.
Thanks,
@julian-ramos Description: In order to merge the sparse categories, there are 3 way to finish this.
if this part is ok, can you help check the result in cluster folder, I write the r script and draw the graph, actually this is my first time doing this, so maybe I will make some mistakes
hi Roy,
What you wrote is a bit difficult to understand. For example, do you mean that you used the 3 different strategies or you just used one of them?
I think the plan you lay out in the merge_category folder is good but I need to know your reasoning behind it,
About the clustering results I will go through them later.
About the clustering part. I think it is ok however add also plot for the standard deviation of the k output from the silhouette score.
Also, we need to plot the gap statistic. You could use this https://stat.ethz.ch/R-manual/R-devel/library/cluster/html/clusGap.html
likewise, plot both the mean and the standard deviation
@julian-ramos Yes you are right, I used all those 3 different strategies.
I think using different strategies will process the data to what we want
Ok, but I still want to see your reasoning!
Here is what I mean:
I merged the Business with the Shopping apps because business apps are mainly related with buying or selling goods while the shopping app is about buying goods from retailers.
On Tue, Nov 25, 2014 at 6:28 PM, Roy Xue notifications@github.com wrote:
@julian-ramos https://github.com/julian-ramos Yes you are right, I used all those 3 different strategies.
I think using different strategies will process the data to what we want
— Reply to this email directly or view it on GitHub https://github.com/RoyXue/MobileAnalysis/issues/2#issuecomment-64506270.
Julian.
@julian-ramos I just uploaded some more description. You can check it. I think there are a few things need to be change(but i think it doesnt matter much), what's your advices?
Hi Roy,
I won't be able to look at all this until Saturday. On Nov 27, 2014 6:59 AM, "Roy Xue" notifications@github.com wrote:
@julian-ramos https://github.com/julian-ramos I just uploaded some more description. You can check it. I think there are a few things need to be change(but i think it doesnt matter much), what's your advices?
— Reply to this email directly or view it on GitHub https://github.com/RoyXue/MobileAnalysis/issues/2#issuecomment-64799797.
Hi Lijun,
I already went through the merge that you did and everything seems fine except maybe merging education and books. You could read books for an educational purpose or for an entertainment purpose.
So to move forward, can you put on your file the final sets of categories maybe something like this:
-Name of the category set 1 categoryA, ...categoryN
-Name of the category set 2 categoryA, ...categoryN ..... -Name of the category set n categoryA, ...categoryN
I want to discuss with a colleague whether he agrees on this categorization
Thanks,
On Thu, Nov 27, 2014 at 3:18 PM, Julian R. ing.julianr@gmail.com wrote:
Hi Roy,
I won't be able to look at all this until Saturday. On Nov 27, 2014 6:59 AM, "Roy Xue" notifications@github.com wrote:
@julian-ramos https://github.com/julian-ramos I just uploaded some more description. You can check it. I think there are a few things need to be change(but i think it doesnt matter much), what's your advices?
— Reply to this email directly or view it on GitHub https://github.com/RoyXue/MobileAnalysis/issues/2#issuecomment-64799797 .
Julian.
Also something else can you give denzilferreira access to this repository. He is actually the colleague I was talking about and also he collected all this information using a framework he created called aware.
Actually no need to put on the file the final set of categories, I found it in the code already.
Thanks
@julian-ramos I added denzilferreira to this repo.
For future reference:
First, since what we got is so sparse first step is to simply merge categories. For instance we do not need to have every single game category in fact maybe we can just merge all the games into one category. So lets do it this way. Each one of us is going to group categories(with constraint of grouping only sparse categories which means a sparse category could join a non-sparse, however a non-sparse should not be considered for merge)
Second, we want to run a clustering algorithm on this data set. For that we need to find out the right number of clusters so we need to do the next: Run k-means for different sizes 2 to say a maximum of 20. Get for every clustering the silhouette score. Thus, we will get a trajectory so we will be able to see when can we expect to reach the right number of clusters. Also, we want to compute the gap statistic. Use the implementation in R which already does everything for you meaning you don't have to write the code for kmeans.
Third, check on the number of clusters we obtain. If reasonable then we can proceed to actually look at the centroids we obtain for that number of clusters and maybe even run different clustering algorithms.