Closed remram44 closed 10 years ago
BeautifulSoup is surprisingly bad at this. Any ideas?
html = '<p>T<i>e</i>st <b>haha</b></p><p>Other\nline</p>'
from bs4 import BeautifulSoup
BeautifulSoup(html).get_text()
# 'Test hahaOther\nline'
BeautifulSoup(html).get_text(' ')
# 'T e st haha Other\nline'
BeautifulSoup(html).get_text('\n')
# 'T\ne\nst \nhaha\nOther\nline'
Some emails might be HTML, we need to convert that to a readable text version.