mwaldstein / edgarWebR

R package for interacting with the SEC's EDGAR filing search and retrieval system
https://mwaldstein.github.io/edgarWebR/
Other
79 stars 16 forks source link

Excessive depth in document #11

Open tangxuning opened 5 years ago

tangxuning commented 5 years ago

Hi,

When I used parse_filing for the below URLs, here are the errors:

Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, : Excessive depth in document: 256 use XML_PARSE_HUGE option [1]

Here are a few sample URLs: https://www.sec.gov/Archives/edgar/data/1065648/000106564809000009/form_10k.htm https://www.sec.gov/Archives/edgar/data/1010247/000101024709000005/form10k.htm https://www.sec.gov/Archives/edgar/data/861459/000086145909000013/form10-q.htm

Again, thanks very much for contributing this package! It's fantastic.

Best regards

mwaldstein commented 5 years ago

Interesting - thanks for the bug report - I'm starting to take a look at this now.

mwaldstein commented 5 years ago

I made an initial fix - they will all at least parse now.

These files have a particularly complicated structure, so it is likely that there will be other "hidden" parsing problems, with parts of the document being missed. Let me know if you see any issues, I did some work to try to cover for those edge cases but there is a good chance I didn't catch them all...