upenndigitalscholarship / deep

MIT License
3 stars 1 forks source link

Formatting in data #18

Closed apjanco closed 1 year ago

apjanco commented 1 year ago

Need to update the html_to_items script to retain HTML formatting in various fields. This formatting can be saved to the db as string field. Would be better to save to HTML field (if that's a thing) or richtext. See deep 5077 for example

apjanco commented 1 year ago

rather than get_text(), decode_contents() will return a string that includes the <i> and <b> tags. The a tags will their old javascript hrefs will still need to be removed (but retain the link text)

apjanco commented 1 year ago

found in the old db In [17]: d.transcript_title Out[17]: 'THE Whole Contention betweene the two Famous Houses, LANCASTER and YORKE. With the Tragicall ends of the good Duke Humfrey, Richard Duke of Yorke, and King Henrie the sixt. Diuided into two Parts: And newly corrected and enlarged. '