Closed neelkandlikar closed 4 years ago
Great job NN. This lgtm. Onwards to the more exciting parts of the analysis! 🎉
this should address all the issues except
03_clean_tokenize-checkpoint.ipynb - define the imported packages that the strict depends on, and move getTdm(doc) to the top of the document. Typically, scripts should start with imports, followed by function definitions, then input files, then code.
We weren't exactly sure how to define imported packages. It'd be great if you could send a more detailed sample for it.
I think you did it correctly, or maybe I missed something.
What I mean is that good code styling is to define all packages and functions up top (even if you write them below — just move them up when they’re stable).
Something like:
import numpy as np import pandas as pd
def my_function(x): print("slick style")
rest of code
On Mon, Sep 14, 2020 at 11:24 PM Neeraj Rattehalli notifications@github.com wrote:
this should address all the issues except 03_clean_tokenize-checkpoint.ipynb - define the imported packages that the strict depends on, and move getTdm(doc) to the top of the document. Typically, scripts should start with imports, followed by function definitions, then input files, then code.
We weren't exactly sure how to define imported packages. It'd be great if you could send a more detailed sample for it.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/neelkandlikar/water-sentiment/pull/11#issuecomment-692494214, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACO3TK4WKWBVZIWGNFQKQITSF4CCZANCNFSM4RDZGB5Q .
aight bet
Great work! I have a few minor comments that can be addressed in a small amount of time. Please review the following points, revise, and resubmit the PR:
03_clean_tokenize-checkpoint.ipynb
- define the imported packages that the strict depends on, and movegetTdm(doc)
to the top of the document. Typically, scripts should start with imports, followed by function definitions, then input files, then code.04_retrieve_geo_data.ipynb
-match_countries(text)
andgeograpy_countries(input)
should be moved to the top of the script, just under imports.05_count_abstracts.ipynb
- this can be looped over. @kashingtonDC and @porefluid can help with this. Although it is tempting to copy and paste, generally try to follow the rule "If you have to do the same thing three times, write a function or a loop." Approaching problems like this will make you a better programmer in the long run.data/iso_list_of_countries.csv
- somewhere in the README, please note where this data was obtained so we have a record tied to it in the GH repo. This will be helpful when writing the manuscript, and it is good practice to document the source of everything so the analysis is reproducible and defensible.dataframes/03_clean_tokenize.csv
- This is a result, which should be in the/results
folder--please move it there. Also, rename toabstracts_clean_tokenized_geo.csv
to indicate they are the cleaned, tokenized, geo-located abstracts, rather than the script name. I've checked, and it appears that the geo data is not yet added to this CSV. Can you make that join, and leave entries in thegeo
columnNA
for the abstracts that were not assigned a country by your algorithm?Once again, great job. Onwards!