Closed apjanco closed 1 year ago
rather than get_text(), decode_contents() will return a string that includes the <i>
and <b>
tags. The a
tags will their old javascript hrefs will still need to be removed (but retain the link text)
found in the old db In [17]: d.transcript_title Out[17]: 'THE Whole Contention betweene the two Famous Houses, LANCASTER and YORKE. With the Tragicall ends of the good Duke Humfrey, Richard Duke of Yorke, and King Henrie the sixt. Diuided into two Parts: And newly corrected and enlarged. '
Need to update the html_to_items script to retain HTML formatting in various fields. This formatting can be saved to the db as string field. Would be better to save to HTML field (if that's a thing) or richtext. See deep 5077 for example