mrbrianevans / social-media-export-analyser

Analyse GDPR exports of your data from big social media companies
https://social-media-export-analyser-mrybc.ondigitalocean.app/
MIT License
1 stars 0 forks source link

Add XML/HTML parsing #40

Closed mrbrianevans closed 2 years ago

mrbrianevans commented 2 years ago

Use a library like cheerio to parse uploaded XML or HTML documents. This is necessary to support Youtube or Facebook files.

mrbrianevans commented 2 years ago

Implemented with cheerio and unit tested.

The HTML preprocessor parses the HTML into a CheerioAPI object which gets passed to the post processor.

The default behaviour is to pretty print the HTML in a string box.

mrbrianevans commented 2 years ago

Instead of rendering the pretty printed text of the HTML, it would be more helpful to actually render the HTML graphically. A library like strip-js can be used to remove (potentially dangerous) javascript from markup before rendering it. In the case of facebook posts, this method would allow the content to be viewed

mrbrianevans commented 2 years ago

sanitize-html is a maintained library to sanatise html