projectdiscovery / katana

A next-generation crawling and spidering framework.
MIT License
12.33k stars 639 forks source link

Any planned support for HTMX #763

Closed cyeganeh01248 closed 2 months ago

cyeganeh01248 commented 9 months ago

My projects have been moving towards more HTMX based solutions. I noticed katana doesn't have great parsing for this yet. I was curious if there is anything on the radar for this.

olearycrew commented 9 months ago

For HTMX or any javascript frontend rendering framwork, you'll probably need to use headless mode - see https://github.com/projectdiscovery/katana?tab=readme-ov-file#crawling-mode

As noted in standard mode

without any javascript or DOM rendering, potentially missing post-dom-rendered endpoints or asynchronous endpoint calls that might happen in complex web applications

But Headless runs in a headless browser you should be able to use katana with it.

cyeganeh01248 commented 9 months ago

I noticed it still doesnt register all the endpoints even in headless mode. In a test app I wrote up, it has a /, /static/styles.less, and a /test path. where the /test path is activated by htmx. It doesn't seem to get that endpoint despite turning on jsl, jc, or headless/system chrome.

Mzack9999 commented 8 months ago

@crigger61 Thanks for opening this issue. Would it be possible to provide an example of the page which is not correctly parsed? Generally headless should be able to render all compatible javascript engines, and indirectly various endpoints are captured at runtime via headless. If the endpoints instead are triggered by some particular browser events (ex. hoovering with the mouse, clicking, etc) or user interaction, katana is probably not still able to do it autonomously, so the only option is implementing a custom parser that extract these info via static analysis.

dogancanbakir commented 2 months ago

Closing this due to inactivity.