numbats / wages-refresh

This will be a repo for re-building, and documenting the process of the wages data from the NLSY
MIT License
0 stars 0 forks source link

Paper Structure #4

Open Dewi-Amaliah opened 3 years ago

Dewi-Amaliah commented 3 years ago
  1. Identify the key points to put in the paper : a. IDA for data quality explained in many papers and documented clearly. b. Going from Open Data to text book data or data example for research paper. c. Raising awareness of tidy form of the data d. Instead of wild data, more emphasise on Open Data and how to transfrom the open data to be text book data. e. Linked to establish of data validation tecniques, e.g validate package.
  2. The primary audience for the paper is someone who is going to use the open data. Secondary audience is the data curator itself.
  3. Rethink the title regarding the primary audience of the paper.
  4. In addition to introduction, Add example of the books using the data
  5. Section 4 : Flow chart of the changes in data from raw to final data
  6. Section 5: EDA examples of the tidy and clean data
  7. Section 6: Summary , discussion
dicook commented 3 years ago

Section 2 The NLSY79 might be better with two subsections

2.1 Database

  1. 2 Target data

where this section gives a brief overview of the wages data as used in singer and willet, that we are aiming to refresh

dicook commented 3 years ago

Hide some of the code. For example in 3.1.1, show the block on tidying the birth data, and gender/race but outline only the tidying of grade. The reader can be pointed to the supplementary code.

Similarly in 3.1.2, maybe just the first block, and bullet point the steps for the other blocks of code.

dicook commented 3 years ago

There are a couple of ?? in the paper which means the references/figures didn't get calculated properly. (eg in 3.2.1)

dicook commented 3 years ago

Fig 2, probably only needs 5 individuals shown, as example - with the caption pointing to the total number of subjects identified to have this type of pattern.

Section 3.2.1 could flow better, in terms of plot chosen and sequence of plots shown, to demonstrate the handling of extremes.

And maybe a little less code - outlined in the text, and referred to be in supplementary.

Dewi-Amaliah commented 3 years ago

Hi Prof.Di,

I have made revisions based on your comments. However, for the flow of section 3.2.1, I am not quite sure, since I only changed the sequence of the two plots corresponding to the extreme values. Or should I compare more weights in a plot?

Best regards, Dewi