mqAncientHistory / Lat-Epig

The Lat-Epig interface allows you to query the EDCS and save the search result in a TSV file and plot the results on a map of the Roman Empire without any prior knowledge of programming.
https://mybinder.org/v2/gh/mqAncientHistory/Lat-Epig/HEAD?urlpath=notebooks/EpigraphyScraper.ipynb
GNU General Public License v3.0
14 stars 0 forks source link

Text cleaner implementation #19

Closed petrifiedvoices closed 2 years ago

petrifiedvoices commented 3 years ago

I have an R script cleaning the text of an inscription to the text-mining friendly version.

It could be added feature for the scraper, ideally leaving the original raw text in one column producing the next clean version in a new column.

R cleaning is> https://github.com/sdam-au/EDCS_ETL/blob/master/scripts/1_2_r_EDCS_cleaning_text.Rmd

Build functions in the correct sequence, producing the Conservative and interpretive version of the text

petrifiedvoices commented 2 years ago

Wrote and passed Unit tests for the most important cleaning features, for both conservative and interpretive cleaning functions