grwells / TickBase

Data storage for web crawler results from TickBase project, summer 2021.
0 stars 1 forks source link

HTML Tag Artifacts Found in CSV #7

Closed grwells closed 2 years ago

grwells commented 2 years ago

Some sources include html tags or have tags from sub elements that were scraped. Use something like BeautifulSoup to remove them...

grwells commented 2 years ago

Fixed, added BS4 tag removal at document level for CSVs and at the DSpace7 interface level.