Closed jamessharp closed 4 years ago
Thanks for reporting, the bug should be resolved 🙂
Specifically, it was a thing related to how the HTML markup was sanitized before being processed
This is the commit that promotes the change: https://github.com/microlinkhq/html-get/commit/a9aecd6d5ef7e204b2fd19821dd25b20ff1a20a7
👍thanks for the speedy resolution!
Bug Report
Current Behavior
Try using microlink to parse e.g. https://www.theguardian.com/education/2020/jun/05/tell-us-about-your-young-childs-experiences-of-going-back-to-school
The title parses as "Thttps://www.theguardian.com/education/2020/jun/05/ell us about your young child’s https://www.theguardian.com/education/2020/jun/05/exphttps://www.theguardian.com/education/2020/jun/05/erihttps://www.theguardian.com/education/2020/jun/05/enchttps://www.theguardian.com/education/2020/jun/05/es of going back to school". The description and publisher are similarly disfigured. This happens using the nodejs skd, but also if you put the article into the microlink web demo too
Expected behavior/code
parsing not to have bits of the web address injected into it
Anything else This started some time on 2nd June I think