Open tangxuning opened 5 years ago
Interesting - thanks for the bug report - I'm starting to take a look at this now.
I made an initial fix - they will all at least parse now.
These files have a particularly complicated structure, so it is likely that there will be other "hidden" parsing problems, with parts of the document being missed. Let me know if you see any issues, I did some work to try to cover for those edge cases but there is a good chance I didn't catch them all...
Hi,
When I used parse_filing for the below URLs, here are the errors:
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, : Excessive depth in document: 256 use XML_PARSE_HUGE option [1]
Here are a few sample URLs: https://www.sec.gov/Archives/edgar/data/1065648/000106564809000009/form_10k.htm https://www.sec.gov/Archives/edgar/data/1010247/000101024709000005/form10k.htm https://www.sec.gov/Archives/edgar/data/861459/000086145909000013/form10-q.htm
Again, thanks very much for contributing this package! It's fantastic.
Best regards