manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
111 stars 40 forks source link

added verify_https option #57

Closed mrx23dot closed 3 years ago

mrx23dot commented 3 years ago

Added verify_https option to be able to bypass https certificate checking for speed increase, by default it still checks (secure by default).

Connection time reduced from 2.89s -> 2.34s

usage: cache = HttpCache(dir, verify_https=False)

manusimidt commented 3 years ago

@mrx23dot Thank you for your pull request. I'm always happy to hear from new maintainers who want to improve the functionality of py-xbrl! 😊

I still believe that disabling SSL certificate validation is the very last place to tweak in order to improve performance. The parser has also built in a delay to not overload the webserver from which the files are requested. Please note that if you parse a submission from the SEC the parser will request files from different webservers with different request rate limits.

Always check the usage policy before decreasing the delay (or even disabling SSL for a faster download and parsing time).

I will merge the pull request now, but I want to write a note in the documentation before I include the change in a new version.

mrx23dot commented 3 years ago

You could leave it as undocumented, it's not for the average user, and it's secure by default. The small delays also add up when I do 2000x4x5 filling pollings. The other one (xml parsing) will have a huge impact on time -90%. I hope the rate limiting timer also runs in the background while we do the parsing.