nasa-petal / data-collection-and-prep

Starting with a list of URLs of papers that can be used for crowdsourcing, create a CSV file with the URL, DOI of the paper, Title, Abstract, and if the paper is open access
The Unlicense
1 stars 5 forks source link

Level 3 update; Filtered Garbled Text from Insert; #122

Closed dsmith111 closed 2 years ago

dsmith111 commented 2 years ago

Summary

Some abstracts would have long sections that looked unreadable (e.x. - - - a4esaf - - - tinyurl - - - -.). due to possible inserted object formatting appearance. A filter has been added to the DVC pipeline to remove these. One level 1 label was not included in the golden.json instructions but was changed in the excel taxonomy file. The label (protect from harm) was updated.