Closed hadleyd2 closed 4 years ago
Hello @qiyangqd,
Thank you for the valuable feedback, and I especially appreciate you troubleshooting the EDA.R
script as no one including my partner, the TA's grading our work, and myself have run into this issue with the Euro symbol. We would not have caught it otherwise.
Listed below are feedback that are addressed:
old and irrelevant files testieTest.Rmd
and Hadley-Test.Rmd
have been removed as per TA feedback here
The issue with the Euro symbol should be solved by replacing it with the unicode u20AC
. I am using this unicode in the price density plot, so if you did not encounter errors there, then my hope is that you won't encounter the error in the violin plot. Since we cannot recreate, it would be very helpful if you could try to import my changes and re-run. Please note that I am now proceeding with this project on my own, so these updates should be imported from the fork of the main project repo. I will link to it once I break the fork, but am waiting for my group partner to confirm this action. In the meantime, the actual commit changes can be viewed here.
Updating usage was a great catch! I changed the names of the scripts between the last two milestones to shorten them, but forgot to update the usage documentation. This has now been done with the change here.
Suggestion for an additional plot: I have been adding plots as I go through the milestones and encounter interesting results. To address the coordinates with price, I have created a distance
variable that combines the coordinates to approximate distance from the city centre as indicated by Google. One issue is that longitude is on a large scale so moves very slightly. New plots will be incorporated in the dashboard project and so are not immediately considered here.
List below is feedback slated for future work:
violin_plot
function in the EDA.R
script. This will take some time to vectorize so that I either adjust all subsequent calls accordingly or get the vectorized output in the same format as it currently is. Issue opened here.Thanks! -Dan
Original issue here.
Hi Group 9!
It is great to peer-review your project. Your topic is very interesting. I learned a lot from your work, not only coding skills like writing functions fluently, but also good habits like keeping detailed documentation. Although I encountered an error in
eda.R
, you did a great job overall.Is the repository organized in a useful and sensible way?
Yes. The Readme is organized sensibly, and it is nice to find files for certain usage in their own folders. It would be better if some files that seem to be no longer useful could be removed, such as
testieTest.Rmd
whose latest update was 18 days ago.Is there a
Usage
section in the README?Yes.
Do the scripts run to completion as described in the README?
load.R
andprocess.R
ran successfully on my computer and printed out nice messages, great job! However,eda.R
didn't run to completion as expected. I received: 8
after entering the command at first as below, and finally figured out that it was the euro sign (€
) in theviolin_plot()
function that caused the problem after two hours digging into the script and testing individual lines. Thus, I would suggest changing the€
sign to letters, which could avoid potential errors.Do the scripts produce the expected output?
Yes. When the
eda.R
ran successfully, it produced three nice plots. I am especially interested in the violin plot which looks really cool.Are the scripts and functions appropriately documented? Yes. The documentation is very detailed and helpful. Comments can be found beside almost every line, making it easy for users to understand. I would suggest that the file name in
Usage
in each script could be the same as the name of the script itself. For example, theUsage
inload.R
isload_data.R ...
. This actually doesn't influence the operation of the scripts, but having the same name could avoid some confusion and potential inconvenience, especially for someone who opens a script and applies theusage
in it instead of the one in the Readme.Does it take a long time for the analysis to run? Are there tasks that could be vectorized to make things faster? No. It runs very quickly, so vectorization isn't very necessary. That being said, it could be used when calculating the mean price of the listings in each district.
Other suggestions:
The research question is appropriate and well-chosen. Visualizations are effective, and it would be even better if more plots that dig into individual factors could be generated (right now only the violin plot shows the relationship between price and district). Since longitude and latitude have the highest values for correlation, there might be something interesting in a "coordinates vs. price" plot.