gsbDBI / torch-choice

Choice modeling with PyTorch: logit model and nested logit model
MIT License
39 stars 8 forks source link

Review Torch Choice tutorial #5

Open charlespebereau opened 2 years ago

charlespebereau commented 2 years ago

Link to tutorial .

I will review and comment all the pages accessible in the link above

@TianyuDu and @kanodiaayush

charlespebereau commented 2 years ago

General remarks

charlespebereau commented 2 years ago

Tutorial for Data Management

TianyuDu commented 2 years ago

Thank you for this helpful and constructive feedback, I truly appreciate these ideas! I am looking through all of them and identifying appropriate ways to address each of them.

TianyuDu commented 2 years ago

I updated your comment by replacing bullet points with check boxes, this makes tracking progress easier.

TianyuDu commented 2 years ago

I have updated the abuot page (now called introduction) here and the data-mangement page here.

charlespebereau commented 2 years ago

BEMB Tutorial - link

There is a lot of information on this page and I'm not sure what is the best way to present it - it depends on what the reader is looking for. I'll make some suggestions but will mostly highlight what information I didn't find or understand. Hopefully this will give you ideas and we can also discuss this at some point.

charlespebereau commented 2 years ago

H_zero_mask Option in obs2prior

Link to notebook (use branch "mask-H-obs2prior")

This tutorial explains how to impose some structure on hierarchical priors. Specifically, it shows how to impose that some entries of the H and W matrix are fixed to 0 and shouldn't be learned. Here are some comments

charlespebereau commented 2 years ago

Tutorial for BEMB with Simulated Data and the obs2prior Option

In this post I review the notebook on obs2prior. Note that I had reviewed the H_zero_mask option which is an extension to this notebook in the post above. Here are some comments

TianyuDu commented 2 years ago

General remarks

  • I have found some of those invisible typos - eg. "Modelling" instead of "Modeling" - and as in any project there may be a bunch that are hard to find. I recommend installing Grammarly on your google chrome, and open the jupyter notebook on a website. Grammarly will spot all these typos and the misplaced commas, and sentences to be rephrased. A screenshot below of what I see when I write these lines Screen Shot 2022-07-25 at 7 11 48 PM
  • The first 4 pages - Home, About, Get started, and Tutorial for Data Management - often repeat what was said in the other pages. I would recommend specializing the pages and providing all the relevant information about the topic covered - ie. never let the reader think he found the page covering the topic he/she was looking for when it is not the appropriate page. Alternatively, provide links to the main page covering the topic that is mentioned in another page

    • [x] eg. installation is mentioned both in Home and Get started but the latter provides more options for installation.
    • [ ] eg. the About page introduces some models and notation but does not explain how they relate to the package, which is covered in the Tutorial for Data management page.
  • I will be brief on the first 3 pages because they will likely change a lot and I'm not an expert to introduce a package. I will give thoughts that crossed my mind
  • Home

    • [x] I'm not sure we need to provide an example using the Canada example on the home page.
    • [x] "1. The package includes a data management tool based on PyTorch's dataset called ChoiceDataset." It seems there is a typo or a word in missing in "PyTorch's dataset"
    • [x] "4. For those without much experience in model PyTorch development ..." I think I am one of those and I felt there was many words in this paragraph that I didn't know/understand (eg. "optimizers", "training loops", "PyTorch lightning wrapper"). I don't know if you have added a feature to your package for people without experience or coded some technical parts using existing packages. Also, would you say that those with experience will use this feature as well or not?
  • About

    • [ ] I think you can be more formal about the model + what is the objective function that is optimized + name the optimization technique.
    • [ ] I am not sure that the Components of the Consumer Choice Modelling Problem are necessary in this page. They will be described in details in the Tutorial for Data Management page.
    • [ ] In case you keep it, I feel that many terms and descriptions relate to consumers in a supermarket. You could rephrase everything with more generic terms and give examples using supermarket or learning apps. Examples below
    • [ ] Consumer <-> user
    • [ ] Purchase record <-> Record
    • [ ] purchase <-> choice
    • [x] "Note: since we will be using PyTorch to train our model, we represent their identities with integer values instead of the raw human-readable names of items". I like the "raw human-readable names of items" :) I think this note is important and shouldn't be lost in the middle of the page but could be in a dedicated section. In particular, you could add tips on how to work with integer values instead of strings. Perhaps in the tutorial there is code for building a dictionary? How can one do raw human-readable descriptive stats on their data when the key variables are integer values?

I have remove the "about page" and created an more integrated "introduction page" here

I have also added an example of using LabelEncoder to encode raw item names to integers.

TianyuDu commented 2 years ago

Tutorial for Data Management

  • [X] "Note: since this package was initially proposed for modeling consumer choices, attribute names of ChoiceDataset are borrowed from the consumer choice literature." To the extent it is possible, I would recommend keeping all the terms as generic as possible. Eg.

    • Consumer <-> user
    • Purchase record <-> Record
    • Purchase <-> choice
    • I don't know about "price" - it's a generic term but how is it treated differently than any other item characteristics? Is it actually necessary in the package?

The price variable does require some future elaboration: since price is a variable depending on both item and session, so by price variable I mean observables that depend on both item and session just like the price. I agree this rises a lot of confusion. I have added some explanations in the new introduction page here. I am also looking for ideas on an alternative word for this kind of variables; probably we can just call it itemsession-obervables? Or we can call them (item, session)-observables? @kanodiaayush @charlespebereau

  • [ ] Coming back to the "Components of the Consumer Choice Modelling Problem" section after reading the whole page: i terms of presentation, I think you shouldn't write about context the same way you write about parameters of the package. For instance, I was confused I couldn't find any "Purchase record" in ChoiceDataset().

  • [ ] General remark 1: this page has "2" sections - one gives a high-level overview and implements examples. Personally, when searching for functions online I am used to first seeing the content of the function and then explanations and then examples. So I would recommend doing the same here. You can start with step 2 of your toy example - see screenshot below. I prefer this because I can read the variable names and guess what they are by they name. Then I would scroll down to get more information - eg. dimensions and type of "user_index", default values of "price_obs", etc. Screen Shot 2022-07-25 at 6 51 20 PM

  • [ ]Related to the above comment, I couldn't follow the "Advanced Usage: Additional Observables" paragraph without knowing what variables are in the package. For me, this paragraph would be clearer later in the page

  • [ ]About the "What you can do with the ChoiceDataset?" section. Do you have recommendations for doing descriptive stats? Should the researcher do them before passing them through ChoiceDataset? In particular given that everything is transformed into integers

Currently, we want users of our package to conduct their own descriptive analysis of their dataset using their own favorite analysis tool. I agree that our ChoiceDataset could have a succinct helper function providing simple summary statistics of the dataset. I have initiated a new issue here for this.

  • [ ] "Overall, there are four types of additive component, except the error term, in the utility

    • notation: specify the dimensions of X_i^item
    • could we add a component with user observables X_u^user?
TianyuDu commented 2 years ago

For improvements on the BEMB documentation website, I have moved these comments to the bemb repository (please see links above).

TianyuDu commented 2 years ago

All feedback by Charles about the torch-choice package has been addressed in the latest documentation website. All comments for the bemb package documentation has been copied to issues in the BEMB repository.

Thanks for this really constructive feedback! @charlespebereau @kanodiaayush