added geo-data: fixes #9

richpauloo commented 4 years ago

Great work! I have a few minor comments that can be addressed in a small amount of time. Please review the following points, revise, and resubmit the PR:

This is a good time to comment your code, both in this project, and in all future projects. You may know what the code does now, but I guarantee that if you or someone else looked at it in 1 year, it would be hard to know what steps you're taking. Since you are submitting the work in notebooks, I challenge you to use markdown syntax to better integrate these code with methods, and leave little comments in the code itself with terse explanations of what each chunk accomplishes. This will be helpful when you review these scripts and write the methods section of the forthcoming manuscript. Please take some time to improve code documentation, and make this a part of the code-writing process moving forward. Comments do not need to be extensive. Examples here and Google python style guide here
03_clean_tokenize-checkpoint.ipynb - define the imported packages that the strict depends on, and move getTdm(doc) to the top of the document. Typically, scripts should start with imports, followed by function definitions, then input files, then code.
04_retrieve_geo_data.ipynb - match_countries(text) and geograpy_countries(input) should be moved to the top of the script, just under imports.
05_count_abstracts.ipynb - this can be looped over. @kashingtonDC and @porefluid can help with this. Although it is tempting to copy and paste, generally try to follow the rule "If you have to do the same thing three times, write a function or a loop." Approaching problems like this will make you a better programmer in the long run.
data/iso_list_of_countries.csv - somewhere in the README, please note where this data was obtained so we have a record tied to it in the GH repo. This will be helpful when writing the manuscript, and it is good practice to document the source of everything so the analysis is reproducible and defensible.
dataframes/03_clean_tokenize.csv - This is a result, which should be in the /results folder--please move it there. Also, rename to abstracts_clean_tokenized_geo.csv to indicate they are the cleaned, tokenized, geo-located abstracts, rather than the script name. I've checked, and it appears that the geo data is not yet added to this CSV. Can you make that join, and leave entries in the geo column NA for the abstracts that were not assigned a country by your algorithm?

Once again, great job. Onwards!

richpauloo commented 4 years ago

Great job NN. This lgtm. Onwards to the more exciting parts of the analysis! 🎉

NeerajRattehalli commented 4 years ago

this should address all the issues except 03_clean_tokenize-checkpoint.ipynb - define the imported packages that the strict depends on, and move getTdm(doc) to the top of the document. Typically, scripts should start with imports, followed by function definitions, then input files, then code.

We weren't exactly sure how to define imported packages. It'd be great if you could send a more detailed sample for it.

richpauloo commented 4 years ago

I think you did it correctly, or maybe I missed something.

What I mean is that good code styling is to define all packages and functions up top (even if you write them below — just move them up when they’re stable).

Something like:

import numpy as np import pandas as pd

def my_function(x): print("slick style")

rest of code

On Mon, Sep 14, 2020 at 11:24 PM Neeraj Rattehalli notifications@github.com wrote:

this should address all the issues except 03_clean_tokenize-checkpoint.ipynb - define the imported packages that the strict depends on, and move getTdm(doc) to the top of the document. Typically, scripts should start with imports, followed by function definitions, then input files, then code.

We weren't exactly sure how to define imported packages. It'd be great if you could send a more detailed sample for it.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/neelkandlikar/water-sentiment/pull/11#issuecomment-692494214, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACO3TK4WKWBVZIWGNFQKQITSF4CCZANCNFSM4RDZGB5Q .

NeerajRattehalli commented 4 years ago

aight bet

neelkandlikar / water-sentiment

added geo-data: fixes #9 #11