Open IG16 opened 5 years ago
Hi,
Thank you for your question.
Yes, it can happen that some of the users do not contain the prominent features in a given cluster. This is because these users are drawn in because they are similar to existing members of the cluster on some other features.
To illustrate how this may happen, let's suppose the scenario below: maybe a cluster's prominent feature is spamming, but some of the spamming accounts may have other features in common (e.g. registration pattern). When a new account exhibits the same registration pattern but hasn't started spamming yet. It can still have drawn into the spamming cluster because of the similar registration patterns.
Please keep in mind that the prominent features are just the top-ranked features for each cluster and not the full set of features. We use the L-method to make sure that these features are significantly stronger than the rest of the features, but this by no means indicates that the other features have no effect on the cluster formation.
Hope this answers your questions.
Hey, I was using your awesome clickstream algorithm engine when I noticed something interesting.
Here is what I did: I am trying to verify results of the algorithm, so the check I do is the following:
{"exclusionsScore": [1285.0, 336.0208333333333, 0.0, 0.0], "exclusions": ["S2319", "S674", "S3690", "S3361"]}],
Here are the results: I found that I when do verify results - about 20% of users do not have any of the cluster sequences in the input file, meaning they did not perform any of the sequences of actions of the cluster they belong to.
Here is what I expected: Does this result make sense? Shouldn’t users perform at least 1 sequence that appears in cluster they belong to? Thank you very much