wbond / asn1crypto

Python ASN.1 library with a focus on performance and a pythonic API
MIT License
335 stars 140 forks source link

Reading ['encrypted_content'] is very slow while parsing a CMS envelope. #227

Open abhinav-mishra-db opened 2 years ago

abhinav-mishra-db commented 2 years ago

I am parsing envelope encrypted audio files. I can quickly extract all parameters from the envelope object such as 'Key Encryption Key' and details of algorithms. But when extracting the symmetrically encrypted content ('encrypted_content']); the library takes unacceptable amount of time. e.g. for a 48 MB file this step takes approx. 2 minutes. Please suggest if I am doing anything wrong.

from asn1crypto import cms
# the slow step
# file_content is the bytes read.
cms.ContentInfo.load(file_content))['content']['encrypted_content_info']['encrypted_content'].native
abhinavdrs commented 2 years ago

help please :))

wbond commented 2 years ago

This is likely a result of overhead related to specific encoding decisions in the ASN.1 modules, combined with Python runtime performance characteristics.

The first step would be to profile and see where the time is being spent. Then we can determine if there is a bug to fix. If generally it is just the overhead of parsing binary data with Python, then the solution would potentially being some sort of optional, C-based parser.