javinizer / Javinizer

(NSFW) Organize your local Japanese Adult Video (JAV) library
MIT License
606 stars 63 forks source link

Cloudflare issues when scraping from R18 #231

Closed ghost closed 1 year ago

ghost commented 3 years ago

Expected Behavior

Current Behavior

error thrown. looking like a cloudflare check

Write-Error: C:\Users\User\Documents\PowerShell\Modules\Javinizer\2.3.3\Public\Invoke-JVParallel.ps1:548 Line | 548 | Get-RunspaceData | ~~~~ | [VDD-149] [Get-R18Url] Error occured on [GET] on URL | [https://www.r18.com/common/search/searchword=VDD-149/]: Attention Required! | | Cloudflare body{margin:0;padding:0} if (!navigator.cookieEnabled) { | window.addEventListener('DOMContentLoaded', function () { var cookieEl = | document.getElementById('cookie-alert'); cookieEl.style.display = 'block'; }) } | // #cf-wrapper #spinner {width:69px; margin: auto;} #cf-wrapper | #cf-please-wait{text-align:center} .attribution {margin-top: 32px;} .bubbles { background-color: | #f58220; width:20px; height: 20px; margin:2px; border-radius:100%; display:inline-block; } | #cf-wrapper #challenge-form { padding-top:25px; padding-bottom:25px; } #cf-hcaptcha-container { | text-align:center;} #cf-hcaptcha-container iframe { display: inline-block;} @keyframes fader { | 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} } #cf-wrapper #cf-bubbles { width:69px; | } @-webkit-keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} } | #cf-bubbles > .bubbles { animation: fader 1.6s infinite;} #cf-bubbles > .bubbles:nth-child(2) { | animation-delay: .2s;} #cf-bubbles > .bubbles:nth-child(3) { animation-delay: .4s;} Please | enable cookies. One more step Please complete the security check to | access www.r18.com | Please stand by, while we are checking your browser... Redirecting... | Please turn JavaScript on and reload the page. Please enable Cookies and reload the | page. // // | Why do I have to complete a CAPTCHA? Completing the CAPTCHA proves you are a | human and gives you temporary access to the web property. What can | I do to prevent this in the future? If you are on a personal connection, | like at home, you can run an anti-virus scan on your device to make sure it is not infected with | malware. If you are at an office or shared network, you can ask the network administrator | to run a scan across the network looking for misconfigured or infected devices. | Cloudflare Ray ID: 626b13e2bcf2e03b • Your IP: xx.xxx.xxx.xx • Performance | & security by Cloudflare window._cf_translation = {};

Steps to Reproduce (for bugs)

allow me to first mention that I lost all of my metadata during a server revamp, so i had to rescrape my entire collection, so Javinizer was sending a huge number requests in a short amount of time which is likely the main culprit of the error.

is there a fallback for this? perhaps we can set a limit to the amount of videos scraped before stopping/pausing? or automatically stopping javinizer when the cloudflare error is detected? (to be honest it would be nice to have an option for built-in abort when any error is detected to prevent any inconsistencies)

Your Environment

jvlflame commented 3 years ago

I noticed this recently as well when I was scraping R18, but it seems to be selective blocking rather than full blocking like javlibrary.

I'm not sure I want to add a new setting to specify r18 cookies, but if it occurs more frequently in the future I might have to address it.

I did do some refactoring to the scrapers as per #199 to better handle when an error occurs, but it hasn't been pushed since I haven't tested it rigorously yet. The change would result in the sort failing for the specific file if any scraper errors are detected.

JK0304 commented 3 years ago

I got this problem as well, can't scrap anything from R18, i'm using GUI version

jvlflame commented 3 years ago

I have a logic check now on both GUI/CLI versions to detect Javlibrary cloudflare errors. If this is a persistent issue with r18, I'll add the settings/screens for r18 as well. So far it doesn't seem to be as widespread so it could possibly just be a temporary thing.

ghost commented 3 years ago

Just a warning for those scraping R18, you can get hard IP blocked if you send too many requests. I rescraped my library overnight and was met with Error 1020 this morning, being mistaken for a DDoS attack. I'm not sure if this is a permanent block or temporary, but I haven't been able to access R18 for the entire day, so it's not looking too good at the moment. If you have a large library and want to be extra safe I would consider disabling R18 in your settings. JavLibrary seems to be working just fine though.

jvlflame commented 3 years ago

@AreYouDeeWhy What did you have your throttlelimit set as?

ghost commented 3 years ago

@AreYouDeeWhy What did you have your throttlelimit set as?

Throttle Limit was set to 10 using version 2.4.0

ghost commented 3 years ago

Just a warning for those scraping R18, you can get hard IP blocked if you send too many requests. I rescraped my library overnight and was met with Error 1020 this morning, being mistaken for a DDoS attack. I'm not sure if this is a permanent block or temporary, but I haven't been able to access R18 for the entire day, so it's not looking too good at the moment. If you have a large library and want to be extra safe I would consider disabling R18 in your settings. JavLibrary seems to be working just fine though.

following up to this. My IP was indeed perma-blocked by R18's cloudflare. I had to email customer support and get elevated to a system admin to remove my IP block. So it's possible to get it fixed, but it's unnecessary trouble. So again, perhaps lower your throttle limit (i used 10 when i got blocked) or disable R18 scraping to prevent your IP from getting blocked on the R18 website, especially if you have a large library.