Closed k3mlol closed 7 months ago
Thanks for opening this issue. Our JSONL output option includes request information which you can use to extract your desired data. For example, for your use case, you can do the following:
$ katana -u yahoo.com -silent -j | jq -r '"\(.request.method) \(.request.endpoint)"'
GET https://yahoo.com
GET https://www.yahoo.com/
GET https://edge-mcdn.secure.yahoo.com/ybar/cerebro_min.js
GET https://www.yahoo.com/news/sale-donald-trump-lightly-used-080735193.html
GET https://www.yahoo.com/news/ohio-toddler-died-her-mom-130612092.html
GET https://br.yahoo.com
GET https://autos.yahoo.com/michael-jordan-gets-personal-delivery-180611826.html
GET https://hk.yahoo.com
GET https://www.yahoo.com/autos/michael-jordan-gets-personal-delivery-180611826.html
...
Is this something that'll work for you?
Hi dogancanbakir, not it doesn't work for me. I can't see any api data out. rad_yahoo.com.txt I upload this help you diff, for katana, I can't see any api data out, all of them are static files
Could you please clarify your statement "I can't see any API data out."? The command we are running only extracts the request method and endpoint. This means that the results you obtain from running the command are not different from those you get when running Katana as katana -u yahoo.com -silent
.
Hi dogancanbakir, for example, rad cat get
https://sg.yahoo.com/tdv2_fp/api/
https://udc.yahoo.com/v2/public/
https://c2shb-oao.ssp.yahoo.com/admax/
https://query1.finance.yahoo.com/v1/finance/screener/predefined/saved?formatted=true&lang=en-SG®ion=SG&scrIds=all_cryptocurrencies_us&start=0&count=25&enableSectorIndustryLabelFix=true&corsDomain=sg.finance.yahoo.com
all these are API URL
but result of katana are static resource.
Hi @dogancanbakir, I have the same feature request as well. I would like to be able to extract the api endpoints for websites (say yahoo.com) in this case. Currently, katana only supports the extraction of javascript files and html files etc. To illustrate this, for a target such as yahoo.com, I would like to be able to extract all their api endpoints such as https://yahoo.com/api/login etc and not just extract the js files and html files
I see. How about using -headless
mode for better coverage coupled with filters to obtain the desired output? For example:
katana -u yahoo.com -mr "(api\.|\/api\/|\/v[0-9]\/)" -hl -xhr -silent
Hi dogancanbakir do you know how to match the url which response is Content-Type: application/json?
katana -u https://yahoo.com -mr "(api.|\/api\/|\/v[0-9]\/)" -hl -xhr -silent
https://fr.yahoo.com/v2/partners?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://fr.yahoo.com/v2/partners-list?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://uk.yahoo.com/v2/partners?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://video-api.yql.yahoo.com
https://uk.yahoo.com/v2/partners-list?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://consent.yahoo.com/v2/partners?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://consent.yahoo.com/v2/partners-list?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://consent.yahoo.com/v2/partners?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://consent.yahoo.com/v2/collectConsent?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://consent.yahoo.com/v2/collectConsent?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://consent.yahoo.com/v2/partners-list?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://de.yahoo.com/v2/partners?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://de.yahoo.com/v2/partners-list?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://consent.yahoo.com/v2/collectConsent?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://consent.yahoo.com/v2/partners?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://consent.yahoo.com/v2/partners-list?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x12849ff]
goroutine 103203 [running]:
github.com/projectdiscovery/retryablehttp-go.FromRequest(0x0)
/home/runner/go/pkg/mod/github.com/projectdiscovery/retryablehttp-go@v1.0.42/request.go:176 +0x5f
github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Crawler).navigateRequest.func1(0xc001ec4320)
/home/runner/work/katana/katana/pkg/engine/hybrid/crawl.go:89 +0x66a
github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Hijack).Start.func2(0x0?)
/home/runner/work/katana/katana/pkg/engine/hybrid/hijack.go:52 +0x28
reflect.Value.call({0x13905e0?, 0xc00cfe2630?, 0x100c002757e40?}, {0x1572396, 0x4}, {0xc002757f58, 0x1, 0xc001ec4320?})
/opt/hostedtoolcache/go/1.21.5/x64/src/reflect/value.go:596 +0xce7
reflect.Value.Call({0x13905e0?, 0xc00cfe2630?, 0xc001ec4320?}, {0xc002757f58?, 0xc004978420?, 0xc04156b558?})
/opt/hostedtoolcache/go/1.21.5/x64/src/reflect/value.go:380 +0xb9
github.com/go-rod/rod.(*Browser).eachEvent.func1()
/home/runner/go/pkg/mod/github.com/go-rod/rod@v0.114.1/browser.go:401 +0x3d9
github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Hijack).Start.func3()
/home/runner/work/katana/katana/pkg/engine/hybrid/hijack.go:57 +0x22
created by github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Crawler).navigateRequest in goroutine 103165
/home/runner/work/katana/katana/pkg/engine/hybrid/crawl.go:49 +0x51b
the result is not better than rad result.
do you know how to match the url which response is Content-Type: application/json?
katana -u target.com -mdc 'contains(headers, "application/json")'
Could you please let me know which version you are currently using?
Current version: v1.0.5
The nil pointer dereference
issue in your last command execution has been resolved in the dev branch.
It's worth noting that due to the web's non-deterministic nature, the results can vary. However, this is to be expected, and the suggestion is to refine filters/matchers (including DSL) if you have a clear idea of what you're looking for. I hope that helps!
Hi, there is a tool name rad,when I run rad -u https://yahoo.com, this tool can get the api request, for example like this
May I ask, for katana, any plan implement this feature, or which option I use can achieve this?