Open samadhicsec opened 1 year ago
Hi, it would be nice if you supported the parameters of the beautifulsoup get_text method, namely 'separator' and 'strip'. See the BS docs here - https://beautiful-soup-4.readthedocs.io/en/latest/index.html?highlight=get_text#get-text
These could be added as optional kwargs to the Extractor init or indeed just to parse. The default values are separator="" and strip=False (from here - https://github.com/wention/BeautifulSoup4/blob/03a2b3a9d1fc5877212d9d382a512663f24c887d/bs4/element.py#L846)
In my use case I want the separate bits of text returned in a cell combined with a newline character, but they are currently returned with no space. The cell contents in my case is text, but multiple sections of text separated by
Hi, it would be nice if you supported the parameters of the beautifulsoup get_text method, namely 'separator' and 'strip'. See the BS docs here - https://beautiful-soup-4.readthedocs.io/en/latest/index.html?highlight=get_text#get-text
These could be added as optional kwargs to the Extractor init or indeed just to parse. The default values are separator="" and strip=False (from here - https://github.com/wention/BeautifulSoup4/blob/03a2b3a9d1fc5877212d9d382a512663f24c887d/bs4/element.py#L846)
In my use case I want the separate bits of text returned in a cell combined with a newline character, but they are currently returned with no space. The cell contents in my case is text, but multiple sections of text separated by