jaidevd / dataviz-group-3

0 stars 0 forks source link

Project requirements: Q&A #4

Open jaidevd opened 1 year ago

jaidevd commented 1 year ago

Just leaving some thoughts here:

  1. Most popular face masks (and their features): Popularity can only be measured in three ways - number of reviews, average rating, and total sales (adjust sales by price per single mask). Everything other measure of popularity is some function of these three. So find a visualization that encapsulates these three variables. Given this dataset, features of individual products can be very vague - like material, size, etc. These features are accurately captured only in very few instances - we cannot get features directly from the data with enough confidence. Perhaps, we could group the reviews by product ID and infer feature information from them.
  2. What customers like about masks - Looking at wordclouds and scattertext plots - consumers seem to value comfort over other things. Primarily, they seem to like it when masks feel good on their ears and noses, and when the fabric is soft. We still need to find out what they hate.
  3. Customer segmentation - We don't have individual customer IDs beyond a minority of users. That actually may be a good thing. Use customer-specific features like language, hour-of-day, day-of-week information to cluster reviewers and see if any clear clusters can be formed. Can we use text of the reviews too?
jaidevd commented 1 year ago

Filter the scattertext output by brands and products, and make brand-specific recommendations.

jaidevd commented 1 year ago

Study the affinity of clusters with certain behaviours or actions.

For e.g.: Clusters 7 and 8 never give a bad rating! All their ratings are >= 40 These are "early morning" reviewers. If they give good ratings - what do the late night people do?

jaidevd commented 1 year ago

Cluster 5 represents users who:

  1. write reviews on Sunday to Wednesday
  2. are active late evening to night
  3. Mostly Russian and English speakers

Cluster 1:

  1. Wednesday to Saturday
  2. Late afternoon
  3. Higher fraction of Russian speakers

Cluster 4:

  1. Very early week (rarely on Wednesday)
  2. Morning to early afternoon
  3. High Russian representation
  4. (**) High review counts

Cluster 9:

  1. Exclusively Russians and one Arabian
  2. very high answer and review counts

Cluster 8:

  1. Late week (Thursday, Friday, Saturday)
  2. Early morning people (<= 0600 hours)

Cluster 7:

  1. Early week
  2. Early morning (5 am to 8 am)
  3. More Russians in this cluster than English speakers

Cluster 0:

Cluster 10: Noise