knightcrawler-stremio / knightcrawler

A selfhosted Stremio addon
Apache License 2.0
228 stars 39 forks source link

Scarping 1337x error #3

Closed Mizaro closed 5 months ago

Mizaro commented 5 months ago

I got from logs:

[Thu Jan 25 2024 20:55:44 GMT+0000] starting 1337x scrape...
Scrapping 1337x Movies category page 1
Failed 1337x scrapping for [1] Movies due:  Error: Failed browse request
    at browse (/home/node/app/scrapers/1337x/1337x_api.js:66:27)
    at /home/node/app/scrapers/1337x/1337x_api.js:79:23
Scrapping 1337x TV category page 1
Failed 1337x scrapping for [1] TV due:  Error: Failed browse request
    at browse (/home/node/app/scrapers/1337x/1337x_api.js:66:27)
    at /home/node/app/scrapers/1337x/1337x_api.js:79:23
Scrapping 1337x Anime category page 1
Failed 1337x scrapping for [1] Anime due:  Error: Failed browse request
    at browse (/home/node/app/scrapers/1337x/1337x_api.js:66:27)
    at /home/node/app/scrapers/1337x/1337x_api.js:79:23
Scrapping 1337x Documentaries category page 1
Failed 1337x scrapping for [1] Documentaries due:  Error: Failed browse request
    at browse (/home/node/app/scrapers/1337x/1337x_api.js:66:27)
    at /home/node/app/scrapers/1337x/1337x_api.js:79:23
[Thu Jan 25 2024 20:55:44 GMT+0000] finished 1337x scrape

someone knows what should I do to solve it?

trulow commented 5 months ago

I'm getting the same issue with 1337x and some Nyaa files.

su-kkasberg commented 5 months ago

same issue

Gabisonfire commented 5 months ago

I'm trying to reproduce here but it's taking some time, I will see if I can get the same error. Wild guess is that it's linked to the recent DDOS they've been getting and now the scraper has trouble dealing with the CloudFlare check. I've added some debug logging so I can see what's happening.

davidjameshowell commented 5 months ago

I pulled this data from Axios to log all the data:

Axios request failed:  Request failed with status code 403
Status: 403
Headers: Object [AxiosHeaders] {
  date: 'Fri, 26 Jan 2024 05:06:46 GMT',
  'content-type': 'text/html; charset=UTF-8',
  'transfer-encoding': 'chunked',
  connection: 'close',
  'accept-ch': 'Sec-CH-UA-Bitness, Sec-CH-UA-Arch, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Platform, Sec-CH-UA, UA-Bitness, UA-Arch, UA-Full-Version, UA-Mobile, UA-Model, UA-Platform-Version, UA-Platform, UA',
  'cross-origin-embedder-policy': 'require-corp',
  'cross-origin-opener-policy': 'same-origin',
  'cross-origin-resource-policy': 'same-origin',
  'origin-agent-cluster': '?1',
  'permissions-policy': 'accelerometer=(),autoplay=(),browsing-topics=(),camera=(),clipboard-read=(),clipboard-write=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()',
  'referrer-policy': 'same-origin',
  'x-frame-options': 'SAMEORIGIN',
  'cf-mitigated': 'challenge',
  'cache-control': 'private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0',
  expires: 'Thu, 01 Jan 1970 00:00:01 GMT',
  'report-to': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=spaLlm35E029lhpkNB5IY%2BXIxDzaYtucZVN8Bu9B%2BrnZOzA0rkmBMExVvBTRmcpUEFj32%2Fnn1qInpAs2WwlcPySp0eqb0OROwvvttZIvx3ioDDUY4iE5Dr04mA%3D%3D"}],"group":"cf-nel","max_age":604800}',
  nel: '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}',
  vary: 'Accept-Encoding',
  server: 'cloudflare',
  'cf-ray': 'XXXX-FRA',
  'alt-svc': 'h3=":443"; ma=86400'
}
Data: <!DOCTYPE html><html lang="en-US"><head><title>Just a moment...</title><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta name="robots" content="noindex,nofollow"><meta name="viewport" content="width=device-width,initial-scale=1"><style>@keyframes lds-ring{0%{transform:rotate(0deg)}to{transform:rotate(360deg)}}*{box-sizing:border-box;margin:0;padding:0}html{line-height:1.15;-webkit-text-size-adjust:100%;color:#313131}button,html{font-family:system-ui,-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,"Noto Sans",sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol","Noto Color Emoji"}@media (prefers-color-scheme:dark){body{background-color:#222;color:#d9d9d9}body a{color:#fff}body a:hover{text-decoration:underline;color:#ee730a}body .lds-ring div{border-color:#999 transparent transparent}body .font-red{color:#b20f03}body .big-button,body .pow-button{background-color:#4693ff;color:#1d1d1d}body #challenge-success-text{background-image:url()}}body{display:flex;flex-direction:column;min-height:100vh}body.no-js .loading-spinner{visibility:hidden}body.no-js .challenge-running{display:none}body.dark{background-color:#222;color:#d9d9d9}body.dark a{color:#fff}a:hover,body.dark a:hover,body.light a:hover{text-decoration:underline;color:#ee730a}body.dark .lds-ring div{border-color:#999 transparent transparent}body.dark .font-red{color:#b20f03}body.dark .big-button,body.dark .pow-button{background-color:#4693ff;color:#1d1d1d}body.dark #challenge-success-text{background-image:url()}body.light{color:#313131}a,body.light a{color:#0051c3}body.light .lds-ring div{border-color:#595959 transparent transparent}body.light .font-red{color:#fc574a}body.light .big-button,body.light .pow-button{border-color:#003681;background-color:#003681;color:#fff}body.light #challenge-success-text{background-image:url()}a,body.light{background-color:transparent}a{transition:color 150ms ease;text-decoration:none}.main-content{margin:8rem auto;width:100%;max-width:60rem}.heading-favicon{margin-right:.5rem;width:2rem;height:2rem}.footer,.main-content{padding-right:1.5rem;padding-left:1.5rem}.main-wrapper{display:flex;flex:1;flex-direction:column;align-items:center}.font-red{color:#b20f03}.spacer{margin:2rem 0}.h1{line-height:3.75rem;font-size:2.5rem;font-weight:500}.core-msg,.h2{line-height:2.25rem;font-size:1.5rem}.h2{font-weight:500}.body-text,.core-msg{font-weight:400}.body-text{line-height:1.25rem;font-size:1rem}#challenge-error-text,#challenge-success-text{background-image:url();background-repeat:no-repeat;background-size:contain;padding-left:34px}#challenge-success-text{background-image:url();padding-left:42px}.text-center{text-align:center}.big-button{transition-duration:200ms;transition-property:background-color,border-color,color;transition-timing-function:ease;border:.063rem solid #0051c3;border-radius:.313rem;padding:.375rem 1rem;line-height:1.313rem;font-size:.875rem}.big-button:hover{cursor:pointer}.captcha-prompt:not(.hidden){display:flex}.pow-button{margin:2rem 0;background-color:#0051c3;color:#fff}.pow-button:hover{border-color:#003681;background-color:#003681;color:#fff}.footer{margin:0 auto;width:100%;max-width:60rem;line-height:1.125rem;font-size:.75rem}.footer-inner{border-top:1px solid #d9d9d9;padding-top:1rem;padding-bottom:1rem}.clearfix::after{display:table;clear:both;content:""}.clearfix .column{float:left;padding-right:1.5rem;width:50%}.diagnostic-wrapper{margin-bottom:.5rem}.footer .ray-id{text-align:center}.footer .ray-id code{font-family:monaco,courier,monospace}.core-msg,.zone-name-title{overflow-wrap:break-word}.loading-spinner{height:76.391px}.lds-ring,.lds-ring div{display:inline-block;position:relative;width:1.875rem;height:1.875rem}.lds-ring div{box-sizing:border-box;display:block;position:absolute;border:.3rem solid #595959;border-radius:50%;border-color:#313131 transparent transparent;animation:lds-ring 1.2s cubic-bezier(.5,0,.5,1) infinite}.lds-ring div:nth-child(1){animation-delay:-.45s}.lds-ring div:nth-child(2){animation-delay:-.3s}.lds-ring div:nth-child(3){animation-delay:-.15s}@media screen and (-ms-high-contrast:active),screen and (-ms-high-contrast:none){.main-wrapper,body{display:block}}</style><meta http-equiv="refresh" content="375"></head><body class="no-js"><div class="main-wrapper" role="main"><div class="main-content"><noscript><div id="challenge-error-title"><div class="h2"><span id="challenge-error-text">Enable JavaScript and cookies to continue</span></div></div></noscript></div></div><script>(function(){window._cf_chl_opt={cvId: '3',cZone: "1337x.to",cType: 'managed',cNounce: '88169',cRay: '84b63a80a8c9371f',cHash: 'f84a5ebbd151918',cUPMDTk: "\/cat\/Documentaries\/1\/?__cf_chl_tk=x9MoNK5TQOQa..lQsIgAd9mTtKcJ1SwIxuXT2ifXNUE-1706245606-0-gaNycGzNDSU",cFPWv: 'g',cTTimeMs: '1000',cMTimeMs: '375000',cTplV: 5,cTplB: 'cf',cK: "visitor-time",fa: "\/cat\/Documentaries\/1\/?__cf_chl_f_tk=x9MoNK5TQOQa..lQsIgAd9mTtKcJ1SwIxuXT2ifXNUE-1706245606-0-gaNycGzNDSU",md: "vmKuJkT7ZvWjNNWWWgAnsG7ZqlQqQwrWbMge7Q6zEGQ-1706245606-1-AWw6a5r_yOhfW-EQ5ibn-vXn8ND1W3VlsqApNeAV3t-KyR6btSdp5E2-AhzbbxjVxJzP5jbb267ZniGl2QGRNREWA_wraJzCoBx82io1wtou32aEWDCw9hpjAIlXI-t3Gba0MvLFG35ifBX1e1zLnXlKkuyIlUKIYwckig8PIPGsYxV7YUqEdtoiEPHmB3M_yJJ34VvgtCwoHKMphbYxWSrXUvgWSW27oMJiVAxXz9o-dnMmwATJg3jN8gIn-1mh4QPsx9twMrTbtqorTY4wuI-E77PBL3l8oBl6EA7HiZp62qozRtPpbYWvB132lyf5Fpvo7oCUodpD76_9O0omdRTtmBIQ76-YFetcTpLqINCLfyCu8gDfkhoxqT_j3fayDBkfjYSDnaL0RSjqAaw45TfTxt6GWI6oKH9YeNC1BIYBBfyGuB1rDKaGzCsGJkojAsKrkJFCrof8CAlbQh_JHYUxS_sMA7KdxyIZJZn_ciWko5IUqJpI4BD8vepic2S-7rwJCGaAo2LRJprxu-iC6xeT4SItuBWw2Lx1V0ShNETT5UHNGbHon_KprLICp4VS-Jyv9rOlHy7zz9ih_DbwygBiQkbOWnD8VoH4ch6KeM8BsvxTtru1ZjCZsoVpUgjmx_X7_3S9FHCQZ1gPgbeL4ZxKxGDAzsfmAgCyswd2vAFyUIT1HMDTO5I-2m2erLQWt5S-Ho2kMS8tbp8ypOVytiDSkkR2RL4bzJ7BtYxQ4ugxPo-R9YYnjIEcw-HbIyWO-l57k93VSdIvXkMbJ_7VgN9sNb91mHDcb_S7arUTiLqxORPO5egM35njkKtPmHNCvj232Az9LzF4hyCVIB8puAvbXhFFX3TAiPfZb7POSnsp3MzpBFQGYkETKlcWTFJtZLrGTMyIpy5PR2bMeSA978R_akIT7C9_IKDOb7NVvQT9PMG3hGawsC_YhDTV8jrIp-CN0pNS_rITQLTMynrDS6CrLKXhDIDmZJqo2U_G7qqsY9vmImLk5NDOpqnOL_ElahX9PNkiylY7vQOyDWwjKTx4-7u6wMU2Zf5bhjaL3xYCN4mptyaP49KUDddwEDUqDlQKjYpnZN7_LfWUkcSD1GwSqtXYcHms05VdwZ14IYJBWE6uaRx37pcrODJpsRZCE8wfu68-ihE4jrxPhTVXT9r7Y_IkoVPc1PjWH7ghWQQ1fmcEPZg5JhuMezLkt6Ocw7-7GBUiCqR-PJrytAkVApBzkl9hMKGIJY4bUUbjxJkGQC8qoY3oK8xHZkY2eDssMfQzgq8CzR2uiZ6XHJ57SvC53EPE1Cm6Wp1Zh5-C34H7CyMMFWQeQwqAERYFKOd3TRnkKHfwWXh6V6t6T4wzX7AeSCsv5tHXO2Z0AfcbVBnItWyaG4rQuNnCo_k3Bf12RvHXIoIc9d8h6DU83WHT6Xvk1CSE2J9FLcH4HRqWgkSfpJyy0F9VLWKZJkyB0Az7953ZfOYmr-1OQZXpdilaXtoR5nVAqXcjSH4f64R8BNKFejOf6GQlJna_bXf0Z9XrBDb7jMooDK1zNDu9EqQ1KFRKUhQsM8EK_kDXzyYpGMmvDiKf31aig1fYlWU8Cg-bOcTymr8Gslxezs9yvSV5hd3xH26Yp1ATdfT3zM4bv_sAiHATnAyrAOaoAlX86JOHBIDUtMIx3glKayPpGcoNiKR-A2304y8Fz98P61n4qzeAsHqAWuZ66u9pEWQQtIzBxGvsBXvVAVLPXR58wGcV6XbbSicYuhpHGCJ-qrad4AHyYhfI1QYuLShHbskh0F_EbOWRCyd0qd66ZIyqtiJj0zOllnl6UVCU9comPPFVJYnSbkRZgN-AmnD0Z-4nTk5dxqeFhM2EJILHUdXwomSWnXEfbnJU_fNyLgOm565aQ1jL6LikBHrV-10jFmujx-1Z-Hnh55zOdqu5mVWcSDZrYBHGijqS7U9hw4CPguUzgnyBBEpO9ypLT6NfHqjGC8H77nyMp2__UrlCKPX5FxAgb1GtCICmyOc44sX_qVts585Y_4jjmWvD0Na9ng5yGzvvw8pH3rwSc9bupsDJWO68PlbfcU61zeEgwr7vcbN8ulrVlYNsXTgGker-8Qxumm7J7NMN5zkRppc0FF-f6oNLz7R0c5YCXk8gWKrcHoMAel29FTfkWIwS6fGZytCbmqNw50uCD03Jq4zon0J_m3XHsogZvnjoUyqihGYIr7bABMtUBirVN1uV8HjPWW0eBeO3Rvc3qPawXJkByX1i_LrYOQkx7EHlQ89ynq3461CVeN2TEAtfqi8Dx__iz9izmJVqHZyUdgv63ZWPwPHSGlJbklzu26IuCkuWp-MS5Q-Z6gFM6W6k-H4EcJB1bWgKeuBLZqAemnHiO6T0yBbJKssL5duzl_tvkFPIMMqHtmpUP0xELNZETdFZkpnVRe1Waxo2--o4Qn71rFxCGr068UtFvUQbSLsHYF046V51vWCoqPnclmlOlxOqNb8ux0Td-SzG1egIzv31WDCWuG1sOSDgt3CwwXbYkUGG6gyJSAdfWkBzGeLR_bKLb5gqQCHyRyphKm8MOKtnWNeywSErW1SeonkYcyzPdsZcMpC9SC2IcoQ6k_p_mdv0jkJQi_4-eZT8k52kwWjlvGum5NnoIZM-TONaI-k7d43DcegkPxa4BVkWMJJQ6ncsEIju64_SDL4HIH-yISlWro6lLv03DBAbGee2NNf7NjIgCn8a3q-HLB-f34wqXFSPoRS_Xgp3GCOISHWrNFoZMc2fuuS2OQk4uqquDKx4DWnwBnBjBeaCBsgnvM8UOCaXr2I_wCcxDVNuBZ2IDD8olatOorlhKZqbiT7pvCCMacBa8DXFfzGRvX2JYrR7OcH-l0hCFutcu4Zz6J1c_nBquzmSFtViGC5hw3ATd2UEqS168LFtgmL7hJCUNNZbnw6EXWwp87djBYxKKUmBAoxlXj0-e_hzrZ7QLmewcnd_Snbc8uNUhkVK0a_vsDHDfCT1jXszO66yuiNattrqy26UmOcq3tGSl1dSTxIgA_5p1AFAm6H5r3L1q317Tfm1s6StbWHorVKgNpB8ZC1b_ok_0Hfi1IxVJD-VQwQOTpUi9hQkrE2EHrdEcGC8W33VXZeSalivw-slyVmJXZC0B-VDTLnQzz1VO_HlfhFJIJE0vCWnuNbGabxai2HwsMkkNPzyctrra9NZESjzXjsEQwQ650_Xawk3MmomlQe1DfG-4hV1K4PEqS0Cdjwi3zCPTKwhYXnwl75SbBvx3UI_bbBmd2JfV6_cA-D51DOn305Ofet_hhAlOKHnrbja",cRq: {ru: 'aHR0cHM6Ly8xMzM3eC50by9jYXQvRG9jdW1lbnRhcmllcy8xLw==',ra: 'TW96aWxsYS81LjAgKGlQaG9uZTsgQ1BVIGlQaG9uZSBPUyAxN18wXzMgbGlrZSBNYWMgT1MgWCkgQXBwbGVXZWJLaXQvNjA1LjEuMTUgKEtIVE1MLCBsaWtlIEdlY2tvKSBWZXJzaW9uLzE3LjAuMSBNb2JpbGUvMTVFMTQ4IFNhZmFyaS82MDQuMQ==',rm: 'R0VU',d: 'wlsLf2PSUgcnZ79fMmzgEhWxybPLvoMkr28PU+dwOL/YrvLCtySjmBAlmivt4ImcEXP6NhSZSx1tYmXqUBR096bwEWLlSuSmM+uvEfX1SafGSjBSAWyil92Z0Shl2j5lQixbttkHGpcm8sOu3abUxc3iH/+udveXort99oAOC/hYfwcTR3ZUUkCgM59CIHyYsAsdshZFDnArdA2baWLdWVyfCaG7fSAH1gzPmsIheaDJN087C0eL4dh6O48xXxXpQ+9zK5EzuZNCkHVn/Nu8j9VoQU8U+f8ePdM4oaPxi8ey4VG+PipYb+DazTeD+Dp3del1F1U8i6vXkKNYH16bkMXRdF2ge9q1P/JVZgHws6bVfQC+IbKB0ZwkVS4UVIWxjZ6SUfbKgm7nmW/neLZG68gSgZL2sIiZZEyf2cARwNbxPCpfZjKeGzrzVr6q6Udq8Z2dYw0zZni4v3ldIn7kZHzR1J3+OvmwqxKKLYzVgB7erokMlAkqdSMO+qUmeIvF9vcmzreXjfCkl3cZXtVP9e8awWV/MJcQDHkd2057JSs=',t: 'MTcwNjI0NTYwNi41MTAwMDA=',cT: Math.floor(Date.now() / 1000),m: 'FpKfIJguCdth70tM8Wji7bzUSybXfjjvDbPQHAVXeOw=',i1: '2kLrVuIUsE9J5J3iCQ6Nrg==',i2: 'h68Vso9njHCvpJzjKYdY8Q==',zh: 'xmBxJ1Mibh1mUnzJkvslrLCko+sCF4oYh1MeoNmRJjI=',uh: 'A+3Unb9auSYaCtjRbnTu+e5raOuuVymer0X2gAjlCig=',hh: 'wISkNp3ygqHlR+mPTzuN2TXAU5nSweTOw8SZT3YYNKA=',}};var cpo = document.createElement('script');cpo.src = '/cdn-cgi/challenge-platform/h/g/orchestrate/chl_page/v1?ray=84b63a80a8c9371f';window._cf_chl_opt.cOgUHash = location.hash === '' && location.href.indexOf('#') !== -1 ? '#' : location.hash;window._cf_chl_opt.cOgUQuery = location.search === '' && location.href.slice(0, location.href.length - window._cf_chl_opt.cOgUHash.length).indexOf('?') !== -1 ? '?' : location.search;if (window.history && window.history.replaceState) {var ogU = location.pathname + window._cf_chl_opt.cOgUQuery + window._cf_chl_opt.cOgUHash;history.replaceState(null, null, "\/cat\/Documentaries\/1\/?__cf_chl_rt_tk=x9MoNK5TQOQa..lQsIgAd9mTtKcJ1SwIxuXT2ifXNUE-1706245606-0-gaNycGzNDSU" + window._cf_chl_opt.cOgUHash);cpo.onload = function() {history.replaceState(null, null, ogU);}}document.getElementsByTagName('head')[0].appendChild(cpo);}());</script></body></html>
Config: {
  transitional: {
    silentJSONParsing: true,
    forcedJSONParsing: true,
    clarifyTimeoutError: false
  },
  adapter: [ 'xhr', 'http' ],
  transformRequest: [ [Function: transformRequest] ],
  transformResponse: [ [Function: transformResponse] ],
  timeout: 10000,
  xsrfCookieName: 'XSRF-TOKEN',
  xsrfHeaderName: 'X-XSRF-TOKEN',
  maxContentLength: -1,
  maxBodyLength: -1,
  env: {
    FormData: [Function: FormData] {
      LINE_BREAK: '\r\n',
      DEFAULT_CONTENT_TYPE: 'application/octet-stream'
    },
    Blob: null
  },
  validateStatus: [Function: validateStatus],
  headers: Object [AxiosHeaders] {
    Accept: 'application/json, text/plain, */*',
    'Content-Type': undefined,
    'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 17_0_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0.1 Mobile/15E148 Safari/604.1',
    'Accept-Encoding': 'gzip, compress, deflate, br'
  },
  method: 'get',
  url: 'https://1337x.to/cat/Documentaries/1/',
  data: undefined
}

So they have Cloudflare protecting 1337x as well as bot detection, so currently any basic tooling will not bypass it. It doesn't appear that basic headless browsers work either, including packages from Python like cloudscaper or Humanoid. I was using a headful Docker container with xvfb for another project with success, so I am trying to spin that up to repurpose it as an API to retrieve the source files of these pages went sent to the (hopeful) API.

Gabisonfire commented 5 months ago

@davidjameshowell Thanks for your input. That's what I suspected. Was the only thing I can see that changed lately. I've used this with Jackett as well: https://github.com/FlareSolverr/FlareSolverr I'll give it a go.

davidjameshowell commented 5 months ago

I forgot about Flaresolver. I had it setup briefly for Jackett as well but went a different direction. I was playing around with some headful browsers in containers but...

curl -L -X POST 'http://192.168.1.2:8191/v1' \
-H 'Content-Type: application/json' \
--data-raw '{
  "cmd": "request.get",
  "url":"https://1337x.to/torrent/2099267/Ubuntu-MATE-16-04-2-MATE-armhf-img-xz-Uzerus/",
  "maxTimeout": 60000
}'
{"status": "ok", "message": "Challenge solved!", "solution": {"url": "https://1337x.to/torrent/2099267/Ubuntu-MATE-16-04-2-MATE-armhf-img-xz-Uzerus/", "status": 200, "cookies": [{"domain": ".1337x.to", "expiry": 1737790044, "httpOnly": true, "name": "cf_clearance", "path": "/", "sameSite": "None", "secure": true,...

Looks like we might have a path forward! This was ran on a residential IP, not a datacenter IP for what it's worth.

Mizaro commented 5 months ago

I used another proxy without CF and got this error after some hours

Failed 1337x scraping due:  ConnectionError [SequelizeConnectionError]: sorry, too many clients already
    at Client._connectionCallback (/home/node/app/node_modules/sequelize/lib/dialects/postgres/connection-manager.js:143:24)
    at Client._handleErrorWhileConnecting (/home/node/app/node_modules/pg/lib/client.js:327:19)
    at Client._handleErrorMessage (/home/node/app/node_modules/pg/lib/client.js:347:19)
    at Connection.emit (node:events:513:28)
    at /home/node/app/node_modules/pg/lib/connection.js:117:12
    at Parser.parse (/home/node/app/node_modules/pg-protocol/dist/parser.js:40:17)
    at Socket.<anonymous> (/home/node/app/node_modules/pg-protocol/dist/index.js:11:42)
    at Socket.emit (node:events:513:28)
    at addChunk (node:internal/streams/readable:315:12)
    at readableAddChunk (node:internal/streams/readable:289:9)
    at Socket.Readable.push (node:internal/streams/readable:228:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23) {
  parent: error: sorry, too many clients already
      at Parser.parseErrorMessage (/home/node/app/node_modules/pg-protocol/dist/parser.js:287:98)
      at Parser.handlePacket (/home/node/app/node_modules/pg-protocol/dist/parser.js:126:29)
      at Parser.parse (/home/node/app/node_modules/pg-protocol/dist/parser.js:39:38)
      at Socket.<anonymous> (/home/node/app/node_modules/pg-protocol/dist/index.js:11:42)
      at Socket.emit (node:events:513:28)
      at addChunk (node:internal/streams/readable:315:12)
      at readableAddChunk (node:internal/streams/readable:289:9)
      at Socket.Readable.push (node:internal/streams/readable:228:10)
      at TCP.onStreamRead (node:internal/stream_base_commons:190:23) {
    length: 85,
    severity: 'FATAL',
    code: '53300',
    detail: undefined,
    hint: undefined,
    position: undefined,
    internalPosition: undefined,
    internalQuery: undefined,
    where: undefined,
    schema: undefined,
    table: undefined,
    column: undefined,
    dataType: undefined,
    constraint: undefined,
    file: 'proc.c',
    line: '359',
    routine: 'InitProcess'
  },
  original: error: sorry, too many clients already
      at Parser.parseErrorMessage (/home/node/app/node_modules/pg-protocol/dist/parser.js:287:98)
      at Parser.handlePacket (/home/node/app/node_modules/pg-protocol/dist/parser.js:126:29)
      at Parser.parse (/home/node/app/node_modules/pg-protocol/dist/parser.js:39:38)
      at Socket.<anonymous> (/home/node/app/node_modules/pg-protocol/dist/index.js:11:42)
      at Socket.emit (node:events:513:28)
      at addChunk (node:internal/streams/readable:315:12)
      at readableAddChunk (node:internal/streams/readable:289:9)
      at Socket.Readable.push (node:internal/streams/readable:228:10)
      at TCP.onStreamRead (node:internal/stream_base_commons:190:23) {
    length: 85,
    severity: 'FATAL',
    code: '53300',
    detail: undefined,
    hint: undefined,
    position: undefined,
    internalPosition: undefined,
    internalQuery: undefined,
    where: undefined,
    schema: undefined,
    table: undefined,
    column: undefined,
    dataType: undefined,
    constraint: undefined,
    file: 'proc.c',
    line: '359',
    routine: 'InitProcess'
  }
}
davidjameshowell commented 5 months ago
[Fri Jan 26 2024 17:52:31 GMT+0000] starting 1337x scrape...
Scrapping 1337x Movies category page 1
(node:11870) [MONGODB DRIVER] DeprecationWarning: Db.collection option [strict] is deprecated and will be removed in a later version.
(Use `node --trace-deprecation ...` to show where the warning was created)
Created 1337x entry for [XXXX] The Continental From the XXX of John XXX 2023 S01 XXX WEBRip SDR 10Bit 1440p DDP5.1 Atmos HEVC-3Li
Created 1337x entry for [XXXX] Obitelj.Thornberry.Film.(2002).XXXX.x265.4Mbps.2CH.320.crtani.film.hrvatski.sink

Using Flaresolverr has worked for me. Will continue scraping 1337x and see where that lands.

trulow commented 5 months ago
[Fri Jan 26 2024 17:52:31 GMT+0000] starting 1337x scrape...
Scrapping 1337x Movies category page 1
(node:11870) [MONGODB DRIVER] DeprecationWarning: Db.collection option [strict] is deprecated and will be removed in a later version.
(Use `node --trace-deprecation ...` to show where the warning was created)
Created 1337x entry for [XXXX] The Continental From the XXX of John XXX 2023 S01 XXX WEBRip SDR 10Bit 1440p DDP5.1 Atmos HEVC-3Li
Created 1337x entry for [XXXX] Obitelj.Thornberry.Film.(2002).XXXX.x265.4Mbps.2CH.320.crtani.film.hrvatski.sink

Using Flaresolverr has worked for me. Will continue scraping 1337x and see where that lands.

Please keep us updated. It'd be great if this fixes the issue.

Gabisonfire commented 5 months ago

@Mizaro this is a database error @davidjameshowell nice work! Can you elaborate a bit on how you integrated flaresolverr with the scrapper?

trulow commented 5 months ago

@davidjameshowell

I just deployed flaresolverr via docker. But I was hoping if you could guide us how to make sure that the scraper is pointing to it.

davidjameshowell commented 5 months ago

I changed my singleRequest function in the API to look like the following:

const singleRequest = (requestUrl, config = {}) => {
  const defaultTimeout = 60000; // Define the default timeout if it's not provided in the config
  const timeout = config.timeout || defaultTimeout;

  const payload = {
    cmd: "request.get",
    url: requestUrl,
    session: "8c60b356-bc8e-11ee-b649-0242ac110004",
    maxTimeout: timeout
  };

  const headers = {
    'Content-Type': 'application/json',
    // Include other headers if necessary
  };

  const options = {
    headers: headers,
    timeout: timeout,
  };

  return axios.post(process.env.FLARESOLVERR_ENDPOINT, payload, options)
    .then((response) => {
      const body = response.data;
      if (!body || !body.solution || !body.solution.response) {
        throw new Error(`Invalid response structure: ${JSON.stringify(body)}`);
      }

      console.log(body);
      const solutionResponse = body.solution.response;

      // Here you can further process solutionResponse or return it
      // For instance, check if it contains '502: Bad gateway', '403 Forbidden', or '1337x</title>'
      // if (solutionResponse.includes('502: Bad gateway') ||
      //     solutionResponse.includes('403 Forbidden') ||
      //     !(solutionResponse.includes('1337x</title>'))) {
      //   throw new Error(`Invalid body contents: ${requestUrl}`);
      // }

      return solutionResponse;
    })
    .catch((error) => {
      if (error.response) {
          // The request was made and the server responded with a status code
          // that falls out of the range of 2xx
          console.error("Error data:", error.response.data);
          console.error("Error status:", error.response.status);
          console.error("Error headers:", error.response.headers);
      } else if (error.request) {
          // The request was made but no response was received
          console.error("Error request:", error.request);
      } else {
          // Something happened in setting up the request that triggered an Error
          console.error("Error message:", error.message);
      }

      // Rethrow the error so the calling function knows an error occurred.
      throw error;
  });
};

I am currently experimenting with sessions in Flaresolverr. I can scrape a handful of pages before I still get blocked with Flaresolverr ("The website owner has restricted access").

Gabisonfire commented 5 months ago

Thanks for the input, I will see how we can implement it

davidjameshowell commented 5 months ago

For reference, see Flaresolverr for the session create command or remove it from the above code otherwise it'll spin up a new session each time.

davidjameshowell commented 5 months ago

So I have been running into issues with Flaresolverr. It does not like to be bombarded with requests and will silently stop solving if you send a bunch of requests to FS. Individual requests seem to work OK (without sessions), so I'll try to investigate slowing requests down or waiting for success/failures first.

Gabisonfire commented 5 months ago

https://github.com/Gabisonfire/torrentio-scraper-sh/pull/15

trulow commented 5 months ago

@Gabisonfire Can we reopen this issue? Although the latest commit from #15 helps, I think there still is an ongoing issue as flaresolverr does not fully work to bypass the cloudflare restrictions as reported by @davidjameshowell

I deployed the latest changes and I too am experiencing the same issues that @davidjameshowell encountered with bulk requests failing.

Gabisonfire commented 5 months ago

I'll reopen but I don't know if anything can be done on that issue, except maybe implementing a timeout between requests.

luigi370 commented 5 months ago

hi! not related to this issue actually, i just installed with no issues but is not getting any results. I set a RD api.. but nothing. Any clue where to check? thanks @Gabisonfire

KillTrot commented 5 months ago

I'm not experienced enough with FlareSovlerr, but would it help to use the same session for every request?

davidjameshowell commented 5 months ago

I'm not experienced enough with FlareSovlerr, but would it help to use the same session for every request?

In my experience, the main issue for me is that number of requests sent to Flaresolverr at a single time. When it begins scraping categories, it sends one request. Then from there, since it's Node, it asynchronously sends all of the torrents on that first page for that category to Flaresolverr to process, which appears to be a load too much to bear. It will usually get through a handful of them before 500'ing the API for FS.

funkypenguin commented 5 months ago

I found this too - I ended up running 15 instances of Flaresolverr to chew through the l337x scraping

trulow commented 5 months ago

@funkypenguin Did you modify the code to spin up a new flaresolverr docker instance for every new request?

davidjameshowell commented 5 months ago

I found this too - I ended up running 15 instances of Flaresolverr to chew through the l337x scraping

I was running 7, but found it was still too much for it. So sounds like 15 or more is the lucky number then.

funkypenguin commented 5 months ago

@funkypenguin Did you modify the code to spin up a new flaresolverr docker instance for every new request?

No, I'm using Kubernetes, so I just scaled my flaresolverr deployment as necessary :)

(This is the WIP release behind https://torrentio.elfhosted.com)

trulow commented 5 months ago

Without having to use Kubernetes, can someone try deploying 5 flaresolverr docker containers and try the following code? I'm trying to keep the deployment simple without having to complicate things for most users by excluding Kubernetes.

Each backup flaresolverr instance should be named flaresolverr2-5

const resolverUrls = [
  'http://flaresolverr:8191',
  'http://flaresolverr2:8191',
  'http://flaresolverr3:8191',
  'http://flaresolverr4:8191',
  'http://flaresolverr5:8191'
];

function singleRequest(requestUrl, config = {}, retryIndex = 0) {
  const timeout = config.timeout || defaultTimeout;
  const options = { headers: { 'User-Agent': getRandomUserAgent() }, timeout: timeout };

  const resolverUrl = resolverUrls[retryIndex];

  return axios.post(`${resolverUrl}/v1`, {
    cmd: 'request.get',
    url: requestUrl,
  }, options)
    .then((response) => {
      if (response.data.status !== 'ok') {
        if (retryIndex < resolverUrls.length - 1) {
          // Retry using the next resolver URL
          return singleRequest(requestUrl, config, retryIndex + 1);
        } else {
          throw new Error(`FlareSolverr did not return status 'ok': ${response.data.message}`);
        }
      }

      const body = response.data.solution.response;
      if (!body) {
        throw new Error(`No body: ${requestUrl}`);
      } else if (body.includes('502: Bad gateway') ||
        body.includes('403 Forbidden') ||
        !(body.includes('1337x</title>'))) {
        throw new Error(`Invalid body contents: ${requestUrl}`);
      }
      return body;
    });
}

EDIT: Cleaned up the code

KillTrot commented 5 months ago

The solution would probably be to use the cookies returned by FlareSolverr for the following requests instead of doing every request through FlareSolverr.

I will make a pull request, maybe someone can test it?

KillTrot commented 5 months ago

I created a pull request #19

trulow commented 5 months ago

@KillTrot it may be possible for us to combine our solutions. As both @davidjameshowell and @funkypenguin mentioned, that using the same session may not be enough. Although we could get better results if we're using sessions, if the session eventually fails, it would then move on to the next flaresolverr instance and repeat the process.

This solution may mean that we wouldn't need more than a few instances running. I saw that @funkypenguin mentioned they used 15 instances to get through it.

I'm not sure though as it's just theory. I don't know how flaresolverr works on the backend

KillTrot commented 5 months ago

I don't think it is necessary to use multiple instances nor to use sessions. What my solution is doing is: Make a request using Flaresolverr Once the request passes the cloudflare protection, we get, additionally to the normal response, the cookie from Cloudflare, which we can use to make requests that don't have to go through the cloudflare protection, thus not needing Flaresolverr.

One thing I think needs improvement in my solution is error handling. Let's say we get an error because the cookies are not valid anymore, we should reset them and try to go the route with Flaresolverr again.

I will maybe add this later or tomorrow.

trulow commented 5 months ago

@KillTrot thank you for your pull request. I can confirm your changes worked to fix the issue!

@Gabisonfire, I think we can now close this issue.

Thank you everyone!

Gabisonfire commented 5 months ago

thanks for confirming @trulow , my tests are conclusive as well. Closing. Fixed by https://github.com/Gabisonfire/torrentio-scraper-sh/pull/19