projectdiscovery / katana

A next-generation crawling and spidering framework.
MIT License
11.2k stars 595 forks source link

[Feature request]just show the api request #815

Closed k3mlol closed 7 months ago

k3mlol commented 7 months ago

Hi, there is a tool name rad,when I run rad -u https://yahoo.com, this tool can get the api request, for example like this

GET https://yahoo.com/
GET https://www.yahoo.com/
GET https://sg.yahoo.com/?p=us
GET https://sg.yahoo.com/tdv2_fp/api/resource/NotificationHistory.getHistory;count=5;imageTag=img%3A40x40%7C2%7C80;theme=default;notificationTypes=breakingNews;lastUpdate=1711029305;loadInHpViewer=true;includePersonalized=;lang=en-SG;region=SG;partner=yahoo
GET https://sg.yahoo.com/tdv2_fp/api/resource/
GET https://sg.yahoo.com/tdv2_fp/
GET https://sg.yahoo.com/tdv2_fp/api/
GET https://sg.yahoo.com/caas/content/article/?uuid=9e003078-ee4c-3380-a73b-0a5fafd1cce2,79a50f7d-d986-34df-af94-0515b533d384,411cfce1-9ce5-36d3-a687-61904c859f0e&appid=hpgrid_with_rightrail&device=desktop&lang=en-SG&region=SG&site=fp&partner=none&bucket=900&features=enableEVPlayer,enableOverrideSpaceId,contentFeedbackEnabled,enableAdFeedbackV2,enableVideoDocking,enableRRAdsSlots,enableRRAdsSlotsWithJAC,enableAdSlotsNewMap,enableGAMAds,enableGAMAdsOnLoad&rid=3etn62livof1p
GET https://sg.yahoo.com/
GET https://udc.yahoo.com/
POST https://udc.yahoo.com/v2/public/yql?yhlVer=2&yhlClient=rapid&yhlS=2142378882&yhlCT=2&yhlBTMS=1711029306070&yhlClientVer=3.53.39&yhlRnd=l84r1YQgfKbSXXXA&yhlCompressed=3
GET https://udc.yahoo.com/v2/public/
GET https://udc.yahoo.com/v2/
POST https://sg.yahoo.com/fp_ms/_rcv/remote?ctrl=TopicsDesktop&lang=en-SG&m_id=react-wafer-topics&m_mode=json&region=SG&rid=3etn62livof1p&site=fp&apptype=default&instance_id=topicStream&_evtSrc=deferLoad
POST https://sg.yahoo.com/fp_ms/_rcv/remote?ctrl=HoroscopePreview&lang=en-SG&m_id=react-wafer-horoscope&m_mode=json&region=SG&rid=3etn62livof1p&site=fp&apptype=default&instance_id=horoscope&_evtSrc=deferLoad
GET https://guce.yahoo.com/v1/
GET https://sg.yahoo.com/caas/content/
GET https://guce.yahoo.com/
GET https://sg.yahoo.com/caas/
POST https://sg.yahoo.com/fp_ms/_rcv/remote?ctrl=NativeAd&m_id=react-wafer-nativeAd&rid=3etn62livof1p&m_mode=json&designtype=default
GET https://opus.analytics.yahoo.com/
GET https://sg.yahoo.com/tdv2_fp/api/resource/WeatherLocationService.favoriteLocation?lang=en-SG&region=SG&site=fp&ssl=1&crumb=lv83nQPiYp.&returnMeta=true
GET https://sg.yahoo.com/fp_ms/_rcv/
GET https://sg.yahoo.com/fp_ms/
GET http://www.yahoo.com/
POST https://csp.yahoo.com/beacon/csp?src=guce
POST https://sg.yahoo.com/_td_api/beacon/info?beaconSrc=HomepagePWA&bucket=900&eventName=svcWkrRegSuccess&rid=3etn62livof1p
GET https://opus.analytics.yahoo.com/tag/
GET https://sg.yahoo.com/?err=404&err_url=https%3a%2f%2fsg.yahoo.com%2fcaas%2f
GET http://help.yahoo.com/
GET https://csp.yahoo.com/beacon/
GET https://csp.yahoo.com/
POST https://bats.video.yahoo.com/p?_V=V&V_sec=pb&evt=v_request&t=0.7739249358322577&_sqno=0&ts=0&auto=false&bckt=none&ccode=main_single_feed__en-SG__frontpage__default__default__desktop__ga__noSplit&cdn=&cont=1&cpos=9&expb=900&expn=advstrmvideo&expt=strm-inline&expm=na&focus=1&intl=sg&lang=en-SG&layout=&lms_id=&loc=onProp&msz=&p_sec=&p_subsec=&pbst=init&pct=&pd=&pg_name=&pkgt=orphan_img&pls=73ce8a3c-4a37-4b6d-81fb-b121f8ed34af&pltype=ev-desktop&pstaid=79a50f7d-d986-34df-af94-0515b533d384&pstaid_p=&pstcat=&psz=401x226&pt=home&pver=1.4.8&_rid=3etn62livof1p&region=SG&replay=0&rlvtscr=&s=2142378882&sec=strm&site=frontpage&snd=m&subsec=&test=900&type=vod+short&ar=1.77&ver=&vid=79a50f7d-d986-34df-af94-0515b533d384&vidPos=&vlng=0&vs=tt9k6qa1&tmout=10&vptm=10&preload=true&_w=https%3A%2F%2Fsg.yahoo.com%2F%3Fp%3Dus&_R=&adUrl=&view=
POST https://bats.video.yahoo.com/p?_V=V&V_sec=pb&evt=p_init&t=0.4470183612802192&_sqno=0&ts=0&auto=false&bckt=none&ccode=main_single_feed__en-SG__frontpage__default__default__desktop__ga__noSplit&cdn=&cont=0&cpos=9&expb=900&expn=advstrmvideo&expt=strm-inline&expm=na&focus=1&intl=sg&lang=en-SG&layout=&lms_id=&loc=onProp&msz=&p_sec=&p_subsec=&pbst=init&pct=&pd=&pg_name=&pkgt=orphan_img&pls=73ce8a3c-4a37-4b6d-81fb-b121f8ed34af&pltype=ev-desktop&pstaid=79a50f7d-d986-34df-af94-0515b533d384&pstaid_p=&pstcat=&psz=0x0&pt=home&pver=1.4.8&_rid=3etn62livof1p&region=SG&replay=0&rlvtscr=&s=2142378882&sec=strm&site=frontpage&snd=m&subsec=&test=900&type=vod+short&ar=&ver=&vid=79a50f7d-d986-34df-af94-0515b533d384&vidPos=&vlng=0&vs=tt9k6qa1&tmout=10&vptm=10&preload=true&_w=https%3A%2F%2Fsg.yahoo.com%2F%3Fp%3Dus&_R=&adUrl=&view=&continuousPlay=0&loop=0&videoRecommendations=0&ff_ad=1&bcpVersion=7.17.2&brightcovePlayerId=RiVAoaoIb3&overlayPluginVersion=3.0.0&adPluginVersion=5.2.8&playlistUIPluginVersion=5.1.1&percentViewable=NaN
GET https://sg.yahoo.com/_td_api/beacon/
GET https://video-api.yql.yahoo.com/v1/video/sapi/streams/79a50f7d-d986-34df-af94-0515b533d384?srid=1429852555&protocol=http&format=m3u8%2Cmp4%2Cwebm&rt=html&devtype=desktop&offnetwork=false&plid=73ce8a3c-4a37-4b6d-81fb-b121f8ed34af&region=SG&site=frontpage&expb=900&expn=advstrmvideo&bckt=Treatment_Oath_Player&lang=en-SG&width=401&height=226&resize=true&ps=tt9k6qa1&autoplay=false&image_sizes=&excludePS=true&isDockable=0&acctid=&synd=&pspid=2142378882&plidl=&topic=&pver=1.4.8&try=1&failover_count=0&ads=ima&ad.pl=up&ad.pd=&ad.pt=home&ad.pct=&evp=bcp&hlspre=false&ad.plseq=1&pblob=lu%3A0%3Bpt%3Ahome%3Bver%3Amegastrm
GET https://sg.yahoo.com/_td_api/
POST https://bats.video.yahoo.com/p?_V=V&V_sec=pb&evt=v_api&t=0.43679165698991174&_sqno=1&ts=682&auto=false&bckt=none&ccode=main_single_feed__en-SG__frontpage__default__default__desktop__ga__noSplit&cdn=bcp&cont=1&cpos=9&expb=900&expn=advstrmvideo&expt=strm-inline&expm=na&focus=1&intl=sg&lang=en-SG&layout=&lms_id=a0Vd000000DIUTaEAP&loc=onProp&msz=&p_sec=&p_subsec=&pbst=init&pct=&pd=&pg_name=&pkgt=orphan_img&pls=73ce8a3c-4a37-4b6d-81fb-b121f8ed34af&pltype=ev-desktop&pstaid=79a50f7d-d986-34df-af94-0515b533d384&pstaid_p=&pstcat=&psz=401x226&pt=home&pver=1.4.8&_rid=3etn62livof1p&region=SG&replay=0&rlvtscr=&s=2142378882&sec=strm&site=frontpage&snd=m&subsec=&test=900&type=vod+short&ar=1.77&ver=&vid=79a50f7d-d986-34df-af94-0515b533d384&vidPos=&vlng=56&vs=tt9k6qa1&tmout=10&vptm=10&preload=true&_w=https%3A%2F%2Fsg.yahoo.com%2F%3Fp%3Dus&_R=&adUrl=https%3A%2F%2Ftb.pbs.yahoo.com%2Fv1%2Fevp%2Fasset%3Fbcid%3D5afc769f7239855a15fcee15%26pid%3D5afc75ea3a04293dad9f1a1f%26secure%3Dtrue%26rssId%3D79a50f7d-d986-34df-af94-0515b533d384%26firstVideo%3Dtrue%26height%3D226%26width%3D401%26sid%3D73ce8a3c-4a37-4b6d-81fb-b121f8ed34af%26pblob%3Dlu%253A0%253Bpt%253Ahome%253Bver%253Amegastrm%26site%3Dfrontpage%26region%3DSG%26lang%3Den-SG%26space_id%3D2142378882%26experience%3Dadvstrmvideo%26expn%3Dadvstrmvideo%26expb%3D900%26licensor_id%3Da0Vd000000DIUTaEAP%26isDockable%3Dfalse%26m.type%3DVOD%26device%3Ddesktop%26v%3D1%26f%3Djson%26s2s%3Dtrue%26content_len%3D56%26content_title%3DHeavy%2Brain%2Bturns%2BBangkok%2Bhighway%2Binto%2Bcanal%26content_id%3D4b1ac130-4d8b-0089-95f5-bff6ddeda68e%26pver%3D1.4.8%26aver%3D%5BEVP_ADSDKVER%5D%26country%3DSG%26state%3DSouth%2BWest%26place%3DSingapore%26place_type%3Dtown%26ad.plseq%3D1%26ad.pl%3Dup%26ad.pt%3Dhome%26pos%3Dpreroll%26evp%3Dbcp%26fmt%3Dvmap%26ps%3Dtt9k6qa1%26r%3Dhttps%253A%252F%252Fsg.yahoo.com%252F%26givn%3D%5BGOOGLE_INSTREAM_VIDEO_NONCE%5D%26pbckt%3DTreatment_Oath_Player%26npa%3D1%26ltd%3D0%26ppid%3D&view=&url=https%3A%2F%2Fvideo-api.yql.yahoo.com%2Fv1%2Fvideo%2Fsapi%2Fstreams%2F79a50f7d-d986-34df-af94-0515b533d384%3Fsrid%3D1429852555%26protocol%3Dhttp%26format%3Dm3u8%252Cmp4%252Cwebm%26rt%3Dhtml%26devtype%3Ddesktop%26offnetwork%3Dfalse%26plid%3D73ce8a3c-4a37-4b6d-81fb-b121f8ed34af%26region%3DSG%26site%3Dfrontpage%26expb%3D900%26expn%3Dadvstrmvideo%26bckt%3DTreatment_Oath_Player%26lang%3Den-SG%26width%3D401%26height%3D226%26resize%3Dtrue%26ps%3Dtt9k6qa1%26autoplay%3Dfalse%26image_sizes%3D%26excludePS%3Dtrue%26isDockable%3D0%26acctid%3D%26synd%3D%26pspid%3D2142378882%26plidl%3D%26topic%3D%26pver%3D1.4.8%26try%3D1%26failover_count%3D0%26ads%3Dima%26ad.pl%3Dup%26ad.pd%3D%26ad.pt%3Dhome%26ad.pct%3D%26evp%3Dbcp%26hlspre%3Dfalse%26ad.plseq%3D1%26pblob%3Dlu%253A0%253Bpt%253Ahome%253Bver%253Amegastrm
GET https://sg.yahoo.com/tdv2_fp/api/resource/NotificationHistory.getHistory;count=5;imageTag=img%3A40x40%7C2%7C80;theme=default;notificationTypes=breakingNews;lastUpdate=1711029305;loadInHpViewer=true;includePersonalized=;partner=yahoo

May I ask, for katana, any plan implement this feature, or which option I use can achieve this?

dogancanbakir commented 7 months ago

Thanks for opening this issue. Our JSONL output option includes request information which you can use to extract your desired data. For example, for your use case, you can do the following:

$ katana -u yahoo.com -silent -j | jq -r '"\(.request.method) \(.request.endpoint)"'
GET https://yahoo.com
GET https://www.yahoo.com/
GET https://edge-mcdn.secure.yahoo.com/ybar/cerebro_min.js
GET https://www.yahoo.com/news/sale-donald-trump-lightly-used-080735193.html
GET https://www.yahoo.com/news/ohio-toddler-died-her-mom-130612092.html
GET https://br.yahoo.com
GET https://autos.yahoo.com/michael-jordan-gets-personal-delivery-180611826.html
GET https://hk.yahoo.com
GET https://www.yahoo.com/autos/michael-jordan-gets-personal-delivery-180611826.html
...

Is this something that'll work for you?

k3mlol commented 7 months ago

Hi dogancanbakir, not it doesn't work for me. I can't see any api data out. rad_yahoo.com.txt I upload this help you diff, for katana, I can't see any api data out, all of them are static files

dogancanbakir commented 7 months ago

Could you please clarify your statement "I can't see any API data out."? The command we are running only extracts the request method and endpoint. This means that the results you obtain from running the command are not different from those you get when running Katana as katana -u yahoo.com -silent.

k3mlol commented 7 months ago

Hi dogancanbakir, for example, rad cat get

https://sg.yahoo.com/tdv2_fp/api/
https://udc.yahoo.com/v2/public/
https://c2shb-oao.ssp.yahoo.com/admax/
https://query1.finance.yahoo.com/v1/finance/screener/predefined/saved?formatted=true&lang=en-SG&region=SG&scrIds=all_cryptocurrencies_us&start=0&count=25&enableSectorIndustryLabelFix=true&corsDomain=sg.finance.yahoo.com

all these are API URL

but result of katana are static resource.

joelczk commented 7 months ago

Hi @dogancanbakir, I have the same feature request as well. I would like to be able to extract the api endpoints for websites (say yahoo.com) in this case. Currently, katana only supports the extraction of javascript files and html files etc. To illustrate this, for a target such as yahoo.com, I would like to be able to extract all their api endpoints such as https://yahoo.com/api/login etc and not just extract the js files and html files

dogancanbakir commented 7 months ago

I see. How about using -headless mode for better coverage coupled with filters to obtain the desired output? For example:

katana  -u yahoo.com -mr "(api\.|\/api\/|\/v[0-9]\/)" -hl -xhr -silent
k3mlol commented 7 months ago

Hi dogancanbakir do you know how to match the url which response is Content-Type: application/json?

k3mlol commented 7 months ago

katana -u https://yahoo.com -mr "(api.|\/api\/|\/v[0-9]\/)" -hl -xhr -silent

https://fr.yahoo.com/v2/partners?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://fr.yahoo.com/v2/partners-list?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://uk.yahoo.com/v2/partners?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://video-api.yql.yahoo.com
https://uk.yahoo.com/v2/partners-list?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://consent.yahoo.com/v2/partners?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://consent.yahoo.com/v2/partners-list?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://consent.yahoo.com/v2/partners?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://consent.yahoo.com/v2/collectConsent?sessionId=4_cc-session_29458314-37bc-4cf0-886f-c30d44aca7c1
https://consent.yahoo.com/v2/collectConsent?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://consent.yahoo.com/v2/partners-list?sessionId=4_cc-session_5f587f08-3b54-4527-8a85-0cc29486f101
https://de.yahoo.com/v2/partners?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://de.yahoo.com/v2/partners-list?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://consent.yahoo.com/v2/collectConsent?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://consent.yahoo.com/v2/partners?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
https://consent.yahoo.com/v2/partners-list?sessionId=4_cc-session_14aae5d0-2389-4418-85f1-086195ed1ca6
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x12849ff]

goroutine 103203 [running]:
github.com/projectdiscovery/retryablehttp-go.FromRequest(0x0)
        /home/runner/go/pkg/mod/github.com/projectdiscovery/retryablehttp-go@v1.0.42/request.go:176 +0x5f
github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Crawler).navigateRequest.func1(0xc001ec4320)
        /home/runner/work/katana/katana/pkg/engine/hybrid/crawl.go:89 +0x66a
github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Hijack).Start.func2(0x0?)
        /home/runner/work/katana/katana/pkg/engine/hybrid/hijack.go:52 +0x28
reflect.Value.call({0x13905e0?, 0xc00cfe2630?, 0x100c002757e40?}, {0x1572396, 0x4}, {0xc002757f58, 0x1, 0xc001ec4320?})
        /opt/hostedtoolcache/go/1.21.5/x64/src/reflect/value.go:596 +0xce7
reflect.Value.Call({0x13905e0?, 0xc00cfe2630?, 0xc001ec4320?}, {0xc002757f58?, 0xc004978420?, 0xc04156b558?})
        /opt/hostedtoolcache/go/1.21.5/x64/src/reflect/value.go:380 +0xb9
github.com/go-rod/rod.(*Browser).eachEvent.func1()
        /home/runner/go/pkg/mod/github.com/go-rod/rod@v0.114.1/browser.go:401 +0x3d9
github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Hijack).Start.func3()
        /home/runner/work/katana/katana/pkg/engine/hybrid/hijack.go:57 +0x22
created by github.com/projectdiscovery/katana/pkg/engine/hybrid.(*Crawler).navigateRequest in goroutine 103165
        /home/runner/work/katana/katana/pkg/engine/hybrid/crawl.go:49 +0x51b

the result is not better than rad result.

dogancanbakir commented 7 months ago

do you know how to match the url which response is Content-Type: application/json?

katana -u target.com -mdc 'contains(headers, "application/json")'

Could you please let me know which version you are currently using?

k3mlol commented 7 months ago

Current version: v1.0.5

dogancanbakir commented 7 months ago

The nil pointer dereference issue in your last command execution has been resolved in the dev branch.

dogancanbakir commented 7 months ago

It's worth noting that due to the web's non-deterministic nature, the results can vary. However, this is to be expected, and the suggestion is to refine filters/matchers (including DSL) if you have a clear idea of what you're looking for. I hope that helps!