Open oaustegard opened 4 months ago
Hello. Seems like a useful feature. How about a mode that includes document.body.innerHTML
as an attached file? Is conversion to markdown necessary?
The challenge with the full html is it can get very lengthy and include a lot of CSS and JavaScript that is just noise to an LLM. Inner Text could work perhaps
On Mon, Jun 24, 2024 at 7:49 PM polywock @.***> wrote:
Hello. Seems like a useful feature. How about a mode that includes document.body.innerHTML as an attached file? Is conversion to markdown necessary?
— Reply to this email directly, view it on GitHub https://github.com/polywock/ask-screenshot/issues/5#issuecomment-2187646024, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY6BA2P5SY2QEIN6OTJZ3DZJCWA5AVCNFSM6AAAAABJ2MFVYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBXGY2DMMBSGQ . You are receiving this because you authored the thread.Message ID: @.***>
I was worried that including Turndown will balloon the extension's size too much, but it's very small.
In this update, I've included a new Mode called "Page data". Assuming you're on Chrome/Edge, you can try it out by...
A few issues
My bookmarklet is far less advanced than your extension and merely opens the text in a new window for me to copy and paste which I can then do selectively.
I also do some guesswork at extracting only the body/main content of the page; doesn’t always work though: see remEls in https://github.com/oaustegard/bookmarklets/blob/main/markdown_body.js for the logic. Could certainly be improved.
(Most of these bookmarklets were generated with ChatGPT or Claude)
On Wed, Jun 26, 2024 at 2:14 AM polywock @.***> wrote:
I was worried that including Turndown will balloon the extension's size too much, but it's very small.
In this update, I've included a new Mode called "Page data". You can try it out by...
- Extract the packed.zip into folder.
- Go to chrome://extensions
- Enabling Developer mode
- Click "Load unpacked" and load the extracted folder.
packed.zip https://github.com/user-attachments/files/15983186/packed.zip
A few issues
- Sometimes the page data is too large and doesn't fit Claude's context limit. How do you get around this?
— Reply to this email directly, view it on GitHub https://github.com/polywock/ask-screenshot/issues/5#issuecomment-2190815547, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY6BA3IW43I257JD4S5XX3ZJJL2TAVCNFSM6AAAAABJ2MFVYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJQHAYTKNJUG4 . You are receiving this because you authored the thread.Message ID: @.***>
While this is really neat and a quick shortcut, would be even better if you could add text extractor, since the OCR of Claude seems to not capture all the text of a screenshot. Today I use a bookmarklet for this (converting page to Markdown using turndown.js), but an extension like yours would definitely be more convenient