gsbDBI / torch-choice

Choice modeling with PyTorch: logit model and nested logit model

MIT License

39 stars 8 forks source link

Review Torch Choice tutorial #5

Open charlespebereau opened 2 years ago

charlespebereau commented 2 years ago

Link to tutorial .

I will review and comment all the pages accessible in the link above

@TianyuDu and @kanodiaayush

charlespebereau commented 2 years ago

General remarks

I have found some of those invisible typos - eg. "Modelling" instead of "Modeling" - and as in any project there may be a bunch that are hard to find. I recommend installing Grammarly on your google chrome, and open the jupyter notebook on a website. Grammarly will spot all these typos and the misplaced commas, and sentences to be rephrased. A screenshot below of what I see when I write these lines
The first 4 pages - Home, About, Get started, and Tutorial for Data Management - often repeat what was said in the other pages. I would recommend specializing the pages and providing all the relevant information about the topic covered - ie. never let the reader think he found the page covering the topic he/she was looking for when it is not the appropriate page. Alternatively, provide links to the main page covering the topic that is mentioned in another page
- [x] eg. installation is mentioned both in Home and Get started but the latter provides more options for installation.
- [x] eg. the About page introduces some models and notation but does not explain how they relate to the package, which is covered in the Tutorial for Data management page.
I will be brief on the first 3 pages because they will likely change a lot and I'm not an expert to introduce a package. I will give thoughts that crossed my mind
Home
- [x] I'm not sure we need to provide an example using the Canada example on the home page.
- [x] "1. The package includes a data management tool based on PyTorch's dataset called ChoiceDataset." It seems there is a typo or a word in missing in "PyTorch's dataset"
- [x] "4. For those without much experience in model PyTorch development ..." I think I am one of those and I felt there was many words in this paragraph that I didn't know/understand (eg. "optimizers", "training loops", "PyTorch lightning wrapper"). I don't know if you have added a feature to your package for people without experience or coded some technical parts using existing packages. Also, would you say that those with experience will use this feature as well or not?
About
- [x] I think you can be more formal about the model + what is the objective function that is optimized + name the optimization technique.
- [x] I am not sure that the Components of the Consumer Choice Modelling Problem are necessary in this page. They will be described in details in the Tutorial for Data Management page.
- [x] In case you keep it, I feel that many terms and descriptions relate to consumers in a supermarket. You could rephrase everything with more generic terms and give examples using supermarket or learning apps. Examples below
- [x] Consumer <-> user
- [x] Purchase record <-> Record
- [x] purchase <-> choice
- [x] "Note: since we will be using PyTorch to train our model, we represent their identities with integer values instead of the raw human-readable names of items". I like the "raw human-readable names of items" :) I think this note is important and shouldn't be lost in the middle of the page but could be in a dedicated section. In particular, you could add tips on how to work with integer values instead of strings. Perhaps in the tutorial there is code for building a dictionary? How can one do raw human-readable descriptive stats on their data when the key variables are integer values?

charlespebereau commented 2 years ago

Tutorial for Data Management

[x] "Note: since this package was initially proposed for modeling consumer choices, attribute names of ChoiceDataset are borrowed from the consumer choice literature." To the extent it is possible, I would recommend keeping all the terms as generic as possible. Eg.
- Consumer <-> user
- Purchase record <-> Record
- Purchase <-> choice
- I don't know about "price" - it's a generic term but how is it treated differently than any other item characteristics? Is it actually necessary in the package?
[x] Coming back to the "Components of the Consumer Choice Modelling Problem" section after reading the whole page: i terms of presentation, I think you shouldn't write about context the same way you write about parameters of the package. For instance, I was confused I couldn't find any "Purchase record" in ChoiceDataset().
[x] General remark 1: this page has "2" sections - one gives a high-level overview and implements examples. Personally, when searching for functions online I am used to first seeing the content of the function and then explanations and then examples. So I would recommend doing the same here. You can start with step 2 of your toy example - see screenshot below. I prefer this because I can read the variable names and guess what they are by they name. Then I would scroll down to get more information - eg. dimensions and type of "user_index", default values of "price_obs", etc.
[x] Related to the above comment, I couldn't follow the "Advanced Usage: Additional Observables" paragraph without knowing what variables are in the package. For me, this paragraph would be clearer later in the page
[ ] About the "What you can do with the ChoiceDataset?" section. Do you have recommendations for doing descriptive stats? Should the researcher do them before passing them through ChoiceDataset? In particular given that everything is transformed into integers
[x] "Overall, there are four types of additive component, except the error term, in the utility representation:"
- notation: specify the dimensions of X_i^item
- could we add a component with user observables X_u^user?

TianyuDu commented 2 years ago

Thank you for this helpful and constructive feedback, I truly appreciate these ideas! I am looking through all of them and identifying appropriate ways to address each of them.

TianyuDu commented 2 years ago

I updated your comment by replacing bullet points with check boxes, this makes tracking progress easier.

TianyuDu commented 2 years ago

I have updated the abuot page (now called introduction) here and the data-mangement page here.

charlespebereau commented 2 years ago

BEMB Tutorial - link

There is a lot of information on this page and I'm not sure what is the best way to present it - it depends on what the reader is looking for. I'll make some suggestions but will mostly highlight what information I didn't find or understand. Hopefully this will give you ideas and we can also discuss this at some point.

I'm not sure we need an example in the introductory paragraph. Also, the one provided is both more and less general than what the package does: the cdf F is more general than Gumbel distribution but theta*alpha is less general than the utility functions the package can accomodate
Utility formula: you could be more specific about what utility representation the package allows for. I think (maybe I'm wrong) that
- you only have logistic models (the noise epsilon is constrained to follow a Gumbel distribution)
- utility function is additively separable in the observables and allows for interactions between latents. It should be stressed that this actually very a general form because (i) one can always build sophisticated observables, for instance by taking the log or a polynomial transformation of original observables and (ii) because one can impose that the learnable coefficients depend on i, u, s or any combination of them.
Utility formula: we need to be more specific about the model and review the math notation (which is currently incorrect).
- for details, see page 376 of Athey et al (2021)
- the model assumes unit demand for each category, independent choices across categories, and error term distributed according to Guembel distribution (logit)
- there needs to be a discussion on how the outside option is modelled. How does the model choose that the user biuys nothing from a given category? Can we change the value of the outside option in each category or is normalized to 0 for each category?
- Regarding notation: (i) need to index the variables by {uis} and (ii) decompose the utility into a deterministic part and the error term: $\mathcal{U}{uis} = U{uis} + \varepsilon{uis}$. Then $P(i|u,s)$ is a function of $U{uis}$ instead of $\mathcal{U}{uis}$
- I suggest we write the utility function that the package can accommodate in its most general form (ie. sum all the terms that can be included) and then discuss each term one by one
Subsection "Specifying the Dimensions of Coefficients with the coef_dim_dict dictionary".
- I didn't understand what point 4. refers to. I am not sure it was specified in the "Utility formula" subsection that there can be matrix factorization coefficients for observables
There is a section "Specifying Variance of Coefficient Prior Distributions with prior_variance" but I think there is no section about setting the mean of the coeffs.
Regarding obs2prior:
- there should be a link to the dedicated tutorial in this subsection
- it is not clear whether with obs2prior we impose that the variance is the identity matrix or if we can change that.
- it is not clear whether we can impose some form to H or not
"If category_to_item is not provided (or None is provided), the probability of purchasing item i by user u in session s is ..."
- maybe we could slightly rephrase into saying that by default there is only one category which is all the items. But the package can impose subcategories. In any case, the model is unit demand per category: the user buys at most one item per category

charlespebereau commented 2 years ago

H_zero_mask Option in obs2prior

Link to notebook (use branch "mask-H-obs2prior")

This tutorial explains how to impose some structure on hierarchical priors. Specifically, it shows how to impose that some entries of the H and W matrix are fixed to 0 and shouldn't be learned. Here are some comments

I think this is a great extension for the obs2prior tutorial. I will use it in the Learning Tools project and, when this is done, we could add a link to the Learning Tools notebook to show a practical example
We could give some context regarding when imposing structure on hierarchical priors is relevant.
- if I understood correctly, this is mostly relevant when the researcher fears there is not enough data for a flexible hierarchical prior and the researcher has some intuition regarding which dim - to be confirmed
We could also give some hints regarding how to choose the structure of the hierarchy
Related to the two points above, we could adapt the numerical example to highlight the benefits from imposing a structure

charlespebereau commented 2 years ago

Tutorial for BEMB with Simulated Data and the obs2prior Option

In this post I review the notebook on obs2prior. Note that I had reviewed the H_zero_mask option which is an extension to this notebook in the post above. Here are some comments

[ ] we could add some context regarding when using observables helps and when it doesn't
[ ] Relatedly, it would be useful to add a summary of what the tutorial will do before we start the simulations. What are we trying to predict, what do we know about the underlying preferences, what variables do we observe, why in this context we expect using obs2prior will help, etc.
[ ] "The observable of a particular user is a one-hot vector with width num_items and one on the position of item this user loves (as mentioned previously)."
- I thought that this was precisely what we don't observe and try to recover
- Tianyu's comment: it is a trivial example to show how to implement the model and show that the model understands that this variable is very important
[ ] For internal purposes: often, the user observables relate to demographics (age, gender, income, etc.). Why didn't we choose to simulate this type of situation?
- eg. linear relationship between age and which item the buyer loves
[ ] "Fitting the Model"
- it would be useful to talk more about what the package does and what input is necessary. For instance, do we need to set a prior for H and W, does the package sets them by default, does the package tries different ones and selects the best one?

TianyuDu commented 2 years ago

General remarks

I have found some of those invisible typos - eg. "Modelling" instead of "Modeling" - and as in any project there may be a bunch that are hard to find. I recommend installing Grammarly on your google chrome, and open the jupyter notebook on a website. Grammarly will spot all these typos and the misplaced commas, and sentences to be rephrased. A screenshot below of what I see when I write these lines

The first 4 pages - Home, About, Get started, and Tutorial for Data Management - often repeat what was said in the other pages. I would recommend specializing the pages and providing all the relevant information about the topic covered - ie. never let the reader think he found the page covering the topic he/she was looking for when it is not the appropriate page. Alternatively, provide links to the main page covering the topic that is mentioned in another page

[x] eg. installation is mentioned both in Home and Get started but the latter provides more options for installation.

[ ] eg. the About page introduces some models and notation but does not explain how they relate to the package, which is covered in the Tutorial for Data management page.

I will be brief on the first 3 pages because they will likely change a lot and I'm not an expert to introduce a package. I will give thoughts that crossed my mind

Home

[x] I'm not sure we need to provide an example using the Canada example on the home page.

[x] "1. The package includes a data management tool based on PyTorch's dataset called ChoiceDataset." It seems there is a typo or a word in missing in "PyTorch's dataset"

[x] "4. For those without much experience in model PyTorch development ..." I think I am one of those and I felt there was many words in this paragraph that I didn't know/understand (eg. "optimizers", "training loops", "PyTorch lightning wrapper"). I don't know if you have added a feature to your package for people without experience or coded some technical parts using existing packages. Also, would you say that those with experience will use this feature as well or not?

About

[ ] I think you can be more formal about the model + what is the objective function that is optimized + name the optimization technique.

[ ] I am not sure that the Components of the Consumer Choice Modelling Problem are necessary in this page. They will be described in details in the Tutorial for Data Management page.

[ ] In case you keep it, I feel that many terms and descriptions relate to consumers in a supermarket. You could rephrase everything with more generic terms and give examples using supermarket or learning apps. Examples below

[ ] Consumer <-> user

[ ] Purchase record <-> Record

[ ] purchase <-> choice

[x] "Note: since we will be using PyTorch to train our model, we represent their identities with integer values instead of the raw human-readable names of items". I like the "raw human-readable names of items" :) I think this note is important and shouldn't be lost in the middle of the page but could be in a dedicated section. In particular, you could add tips on how to work with integer values instead of strings. Perhaps in the tutorial there is code for building a dictionary? How can one do raw human-readable descriptive stats on their data when the key variables are integer values?

I have remove the "about page" and created an more integrated "introduction page" here

I have also added an example of using LabelEncoder to encode raw item names to integers.

TianyuDu commented 2 years ago

Tutorial for Data Management

[X] "Note: since this package was initially proposed for modeling consumer choices, attribute names of ChoiceDataset are borrowed from the consumer choice literature." To the extent it is possible, I would recommend keeping all the terms as generic as possible. Eg.

Consumer <-> user

Purchase record <-> Record

Purchase <-> choice

I don't know about "price" - it's a generic term but how is it treated differently than any other item characteristics? Is it actually necessary in the package?

The price variable does require some future elaboration: since price is a variable depending on both item and session, so by price variable I mean observables that depend on both item and session just like the price. I agree this rises a lot of confusion. I have added some explanations in the new introduction page here. I am also looking for ideas on an alternative word for this kind of variables; probably we can just call it itemsession-obervables? Or we can call them (item, session)-observables? @kanodiaayush @charlespebereau

[ ] Coming back to the "Components of the Consumer Choice Modelling Problem" section after reading the whole page: i terms of presentation, I think you shouldn't write about context the same way you write about parameters of the package. For instance, I was confused I couldn't find any "Purchase record" in ChoiceDataset().

[ ] General remark 1: this page has "2" sections - one gives a high-level overview and implements examples. Personally, when searching for functions online I am used to first seeing the content of the function and then explanations and then examples. So I would recommend doing the same here. You can start with step 2 of your toy example - see screenshot below. I prefer this because I can read the variable names and guess what they are by they name. Then I would scroll down to get more information - eg. dimensions and type of "user_index", default values of "price_obs", etc.

[ ]Related to the above comment, I couldn't follow the "Advanced Usage: Additional Observables" paragraph without knowing what variables are in the package. For me, this paragraph would be clearer later in the page

[ ]About the "What you can do with the ChoiceDataset?" section. Do you have recommendations for doing descriptive stats? Should the researcher do them before passing them through ChoiceDataset? In particular given that everything is transformed into integers

Currently, we want users of our package to conduct their own descriptive analysis of their dataset using their own favorite analysis tool. I agree that our ChoiceDataset could have a succinct helper function providing simple summary statistics of the dataset. I have initiated a new issue here for this.

[ ] "Overall, there are four types of additive component, except the error term, in the utility

notation: specify the dimensions of X_i^item

could we add a component with user observables X_u^user?

TianyuDu commented 2 years ago

For improvements on the BEMB documentation website, I have moved these comments to the bemb repository (please see links above).

TianyuDu commented 2 years ago

All feedback by Charles about the torch-choice package has been addressed in the latest documentation website. All comments for the bemb package documentation has been copied to issues in the BEMB repository.

Thanks for this really constructive feedback! @charlespebereau @kanodiaayush