scinfu / SwiftSoup

SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)
https://scinfu.github.io/SwiftSoup/
MIT License
4.52k stars 345 forks source link

outerHtml() Takes a very long time when html contains inline embedded data (esp. audio) #152

Closed triton3 closed 2 years ago

triton3 commented 4 years ago

When the html contains inline data elements such as audio, doc.outerHtml() function takes a very long time (more than 2 mins). The inline audio is present as data in base64 format of the form:

<audio controls controlsList="nodownload" src="data:audio/mp4;base64,...">Audio playback is not supported. Please try with a different browser.</audio>

Is there something that can be done to reduce this time to extract the outer html? The audio data itself is less than a couple of MB in size.

scinfu commented 4 years ago

I think this is performance issue, i will work to find a solution.