yuanxu-li / html-table-extractor

extract data from html table
MIT License
84 stars 23 forks source link

cell formatting #22

Open samadhicsec opened 1 year ago

samadhicsec commented 1 year ago

Hi, it would be nice if you supported the parameters of the beautifulsoup get_text method, namely 'separator' and 'strip'. See the BS docs here - https://beautiful-soup-4.readthedocs.io/en/latest/index.html?highlight=get_text#get-text

These could be added as optional kwargs to the Extractor init or indeed just to parse. The default values are separator="" and strip=False (from here - https://github.com/wention/BeautifulSoup4/blob/03a2b3a9d1fc5877212d9d382a512663f24c887d/bs4/element.py#L846)

In my use case I want the separate bits of text returned in a cell combined with a newline character, but they are currently returned with no space. The cell contents in my case is text, but multiple sections of text separated by