natemoo-re / ultrahtml

Other
523 stars 8 forks source link

ultrahtml can't parse github.com #73

Open eigood opened 4 months ago

eigood commented 4 months ago

https://github.com/natemoo-re/ultrahtml/blob/93a127343f7a9f0e2286049145502e0b4dc20b32/src/index.ts#L106

Open github.com in an incognito window. Find the '<qbsearch-input' element(included below). The above line does not handle the embedded '>' inside an attribute.

<qbsearch-input class="search-input" data-scope="" data-custom-scopes-path="/search/custom_scopes" data-delete-custom-scopes-csrf="hnSjGuSn_59uYU1AlkeaQMgUcurMIeDhgFsTysabNfRXa8FbmKkmA7zlWb1mfTwHCkfnNOMusEzxUXOgM3Lb2g" data-max-custom-scopes="10" data-header-redesign-enabled="false" data-initial-value="" data-blackbird-suggestions-path="/search/suggestions" data-jump-to-suggestions-path="/_graphql/GetSuggestedNavigationDestinations" data-current-repository="" data-current-org="" data-current-owner="" data-logged-in="false" data-copilot-chat-enabled="false" data-blackbird-indexed-repo-csrf="<esi:include src=&quot;/_esi/rails_csrf_token_form_hidden?r=kKpNFFgv9T4sMbcblxpXn6oJF49JzOYUpwhJ1cllp2tonW08bXjfYqU1KrCjMG96RRcbB2Q2V3Z%2FXwHy7R51mH5VE8%2BDuikKicFisJkUIi561b2WLbKCZXrYdVWjX2v5hVyP6E94BWPXtVczRjDc9agSQsEIG2qo8Ynv4RCE3u6gHjwZZ%2BuS3atJsiZf7smEoGhbJmU7AbqQeauYm80%2BwLghUx0v42kuagy1cIIoaHtsAXVy1mvkLALpCoEzBR2qo%2F6hVgROb5ACNrjgoxATskDgajiCyy5GOPHy7jb7%2BfdsvPrDUDP8yKOdVaEGSuUkeq8Pd6evRzNCt7DX9RS3saECh8PPgiZx3G0Ypxo2%2F1gjoieH1c1lgDOeQ1D5H%2BQ%2F7CkuKnoRq9iOdyyshZASbIDGJb7jnjDpXTpcRj6up7szOaWwda2IqppY8vkUZ7KsGWMFN9WqUJeBglo5lEY21IeD5MySdLaluKjBraxb6Sx6oCNG0hI%3D--5uGBn2%2FEBbVuAfAd--PdLkkfRWGPMl7pU59RsDjQ%3D%3D&quot; />" data-nl-search-enabled="false">

There are also issues when attributes contain """, the values are not decoded, and just stored raw into the node.attribute.