Bold heading issues - Githubissues

Hi Yeabin,

I've come to realize that much of our extraction process has been inaccurate due to our algorithm's inability to effectively identify headings written in bold. It's become evident that while certain reports are successfully extracted starting from the "Management Discussion and Analysis" (MD&A) headings when they're in bold, others are not. Furthermore, we've encountered situations where the extraction begins from footnotes instead of the actual headings.

This has become a significant concern as I've manually reviewed 30 10-K/10-Q reports, cross-referencing them with the SEC website's HTML-format reports. I found that some reports were accurately extracted with bold headings, while others were not.

I've attempted using the re.compile(r'\sDiscussion\s+and\s+Analysis\s+of\s+Financial\s+Condition[s]?\s', re.IGNORECASE | re.DOTALL) pattern to test on individual files, but unfortunately, it's still not yielding the desired results. Addressing this issue is of utmost importance to improve the accuracy and reliability of our extraction process.

yuxuanbrandeis / Julex

Bold heading issues #8