neelkandlikar / water-sentiment

0 stars 0 forks source link

Add aquifer or basin name #12

Open porefluid opened 4 years ago

porefluid commented 4 years ago

Name match basin/aquifer names using the WHYMAP

Load into a Geopandas dataframe and see how many aquifer names/basins from our abstracts match with aquifer names in the WHYMAP

richpauloo commented 4 years ago

Hey NN, believe it or not in my work today I actually had to use a REGEX to capture all words before a word of interest in a column of characters.

For example, if I had: ["some words in a long sentence", "more words in another sentence"], I needed a REGEX that took the pattern "sentence", and found all word matches before that pattern (long, another in the example).

This is what I used: \\w+(?=\\s+my_string)

where my_string is the word for which you want the previous words. This is just meant to get you started. In your case, to find aquifer names you'll need to search before and after words (basin, aquifer), and create your own whitelist. One of your preprocessing steps should have been to make all words lowercase, so you shouldn't need to worry about case-sensitivity in pattern matching.

Best of luck and feel free to reach out with questions!