Closed bnkai closed 2 months ago
Look like the JavLibrary scraper can be broken sometimes.
You get the DDOS Protection Cloudflare that block it (You normally need to wait 5sec to be redirected to the site.)
I try with useCDP
don't fix it.
Idea: Javlibrary have mirror/clone, maybe it would be good to have a option if it's fail, it change the url and try with these site. Exemple all are the same:
https://www.javlibrary.com/en/?v=javlilbj7e
https://www.m45e.com/en/?v=javlilbj7e
https://www.u44r.com/en/?v=javlilbj7e
https://www.g46e.com/en/?v=javlilbj7e
But i don't think it would be useful for other scraper.
@brumouta thanks for the feedback welivetogether,babes now are moved to a separate one edit added momsbang,momslickteens and propertysex also
RealityKings has some more broken domains: bellesafilms.com, danejones.com, lesbea.com and sexyhub.com only parse the image. Will work fine if they are moved to RealityKingsOL
Thanks for the feedback @budislov The relevant scrapers have been updated
Looks like RealityKingsOL is broken. Tried to scrap from both babes.com and bellesafilms.com and only the tags came through. It appears that the div classes used in the scrapper have changed. Will investigate further.
Pending PR is available for RealityKingsOL and Brazzers
relevant PRs merged
iafd.com performer scraper not working
IAFD fixed , thanks for the report @Ziatexataor and for the fix @Belleyy
TransSensual.yml seems to be broken. Tested with new and older scenes and can't pull the data
@malibustacynewhat thanks for the report The relevant PR by @Belleyy fixes the issue
JAVLibrary is broken https://github.com/stashapp/CommunityScrapers/blob/master/scrapers/javlibrary.yml
Looks to be a Cloudflare error but using the CDP driver didn't resolve it for me when testing:
<!DOCTYPE html><html lang="en-US"><head>
<meta charset="UTF-8"/>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1"/>
<meta name="robots" content="noindex, nofollow"/>
<meta name="viewport" content="width=device-width,initial-scale=1"/>
<title>Just a moment...</title>
<style type="text/css">
html, body {width: 100%; height: 100%; margin: 0; padding: 0;}
body {background-color: #ffffff; color: #000000; font-family:-apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, "Helvetica Neue",Arial, sans-serif; font-size: 16px; line-height: 1.7em;-webkit-font-smoothing: antialiased;}
h1 { text-align: center; font-weight:700; margin: 16px 0; font-size: 32px; color:#000000; line-height: 1.25;}
p {font-size: 20px; font-weight: 400; margin: 8px 0;}
p, .attribution, {text-align: center;}
#spinner {margin: 0 auto 30px auto; display: block;}
.attribution {margin-top: 32px;}
@keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
@-webkit-keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
#cf-bubbles > .bubbles { animation: fader 1.6s infinite;}
#cf-bubbles > .bubbles:nth-child(2) { animation-delay: .2s;}
#cf-bubbles > .bubbles:nth-child(3) { animation-delay: .4s;}
.bubbles { background-color: #f58220; width:20px; height: 20px; margin:2px; border-radius:100%; display:inline-block; }
a { color: #2c7cb0; text-decoration: none; -moz-transition: color 0.15s ease; -o-transition: color 0.15s ease; -webkit-transition: color 0.15s ease; transition: color 0.15s ease; }
a:hover{color: #f4a15d}
.attribution{font-size: 16px; line-height: 1.5;}
.ray_id{display: block; margin-top: 8px;}
#cf-wrapper #challenge-form { padding-top:25px; padding-bottom:25px; }
#cf-hcaptcha-container { text-align:center;}
#cf-hcaptcha-container iframe { display: inline-block;}
</style>
<meta http-equiv="refresh" content="12"/>
<script type="text/javascript">
//<![CDATA[
(function(){
window._cf_chl_opt={
cvId: "1",
cType: "non-interactive",
cNounce: "90957",
cRay: "5e0bb321ef7bca98",
cHash: "da202b537a470c2",
cFPWv: "g",
cRq: {
ru: "aHR0cDovL3d3dy5qYXZsaWJyYXJ5LmNvbS9lbi8/dj1qYXZtZXpiZTNh",
ra: "TW96aWxsYS81LjAgKE1hY2ludG9zaDsgSW50ZWwgTWFjIE9TIFggMTBfMTVfNSkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzgzLjAuNDEwMy4xMDYgU2FmYXJpLzUzNy4zNg==",
rm: "R0VU",
d: "q4jiR7WSBtf4fLzLz9igfZOdIxwSKG18lkM8oKJ2oB8n30GM2iyW8aiQ9atzUZsOBOiOCY1F45Ok0xoQE9LhBiZfXlfVJaHdOBUlqNu1cCbboEIdvJX1FuypXHYYwXjfaKTC2p4xeTL5nAkfqvaQqkt1H/1p0rqFLGuv5JXJ3gBxB6Y/uALdxdsFi+lSlCG6Qe3X2Lj+WYyKl3todU7QjK8vUNythAJOrMTlR1fGrfbfXESvY4tSMJo7OEhwZymfB+AKhpzlHeTcuo+T40qfUHcXUDFRZCqSIvBynJ532Jn2bbqiZ1XffuBhRCVhBxK+kkJ9NurfuchvBr0bA3lk+Dnyykdr0hUr5lE34hioN0t6bDwXnGSBMCsX40Hx6TDDQa+utstnZqYk3G1jtYupvATJXzjvxhaNDHgOwHJomiUip/glK6aw52FuNwxXEj7ZJmdJPg4omti3B/1l7wy5+Z1rERc/nHgZE2JBxsOMDFpFXx6oNX/ZCk1//+mIVxGVFfNCBIGI1eyIKCP6LkCcsw1+aeO2YHmOzBkz9Ebx3drg5ouDQU0bmnNNsuh6vtMZ2eydA3b8y1H2mfO+UoUwB7Ej5u0cR1gJGbuSHpK+imsOFpqmwJdDPhqXYl5xcy6nVCnU2xeyqXJP/HMHGjU4h3Op/vlZKIuhtqFPC6Guk0FIUbFTI4JGMG7u3UwcuuYUrnmYXFX1vupeVrqsjsRFJXnqRhnWc+EJ62b3QYIqf/pFpb/eKU8DpE4wKEmd05vkzLCS1DZQ29AxACho6Zf0brScVV2/qvY5qVsNlk9QCSJdmmR7eyfAPju4BoRmFWdRVEymwQHM7raS1XGdZvcFDw==",
t: "MTYwMjQ1MjAwOS4yNzIwMDA=",
m: "jAJ8FygcOeJXMeDg2+r+pIbPCZvv7uD3AA/cCQ2MIkQ=",
i1: "2tfaQpq68/qtCUW9AL9YZA==",
i2: "DT4KsCiUsfsu8FZXKRmHjg==",
uh: "TprDV0CpLyfpdzs+8x+WX/Btsv1e+OQLx8NzEGjSfMY=",
hh: "3htzUBXaqug0moZaVaRPWNYG1rRQQxdDndKhxQafs0M=",
}
}
window._cf_chl_enter = function(){window._cf_chl_opt.p=1};
var a = function() {try{return !!window.addEventListener} catch(e) {return !1} },
b = function(b, c) {a() ? document.addEventListener("DOMContentLoaded", b, c) : document.attachEvent("onreadystatechange", b)};
b(function(){
var cookiesEnabled=(navigator.cookieEnabled)? true : false;
var cookieSupportInfix=cookiesEnabled?'/nocookie':'/cookie';
var a = document.getElementById('cf-content');a.style.display = 'block';
var isIE = /(MSIE|Trident\/|Edge\/)/i.test(window.navigator.userAgent);
var trkjs = isIE ? new Image() : document.createElement('img');
trkjs.setAttribute("src", "/cdn-cgi/images/trace/jschal/js"+cookieSupportInfix+"/transparent.gif?ray=5e0bb321ef7bca98");
trkjs.id = "trk_jschal_js";
trkjs.setAttribute("alt", "");
document.body.appendChild(trkjs);
var cpo = document.createElement('script');
cpo.type = 'text/javascript';
cpo.src = "/cdn-cgi/challenge-platform/h/g/orchestrate/jsch/v1";
var done = false;
cpo.onload = cpo.onreadystatechange = function() {
if (!done && (!this.readyState || this.readyState === "loaded" || this.readyState === "complete")) {
done = true;
cpo.onload = cpo.onreadystatechange = null;
window._cf_chl_enter()
}
};
document.getElementsByTagName('head')[0].appendChild(cpo);
}, false);
})();
//]]>
</script>
</head>
<body>
<div style="display: none;"><a href="http://bt50.org/nonalignedfrequent.php?pl=0">table</a></div><table width="100%" height="100%" cellpadding="20">
<tbody><tr>
<td align="center" valign="middle">
<div class="cf-browser-verification cf-im-under-attack">
<noscript>
<h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>
</noscript>
<div id="cf-content" style="display:none">
<div id="cf-bubbles">
<div class="bubbles"></div>
<div class="bubbles"></div>
<div class="bubbles"></div>
</div>
<h1><span data-translate="checking_browser">Checking your browser before accessing</span> javlibrary.com.</h1>
<div id="no-cookie-warning" data-translate="turn_on_cookies" style="display:none">
<p data-translate="turn_on_cookies" style="color:#bd2426;">Please enable Cookies and reload the page.</p>
</div>
<p data-translate="process_is_automatic">This process is automatic. Your browser will redirect to your requested content shortly.</p>
<p data-translate="allow_5_secs">Please allow up to 5 seconds…</p>
</div>
<form class="challenge-form" id="challenge-form" action="/en/?v=javmezbe3a&__cf_chl_jschl_tk__=c44b146f044ddd9d0b23bf4928759e99e7ddef0e-1602452009-0-Ab8hTl3noYmOwwAWI1D0d_6zhaYO-4vHBJD8JW4VCFmZKjqal-xVCdpCdbztfKStCEp8QJa2ganoOGB_Jnq-Qwtu6BnG7zySJxaY_Oc54OgSHPG3Mt1wJ-nYfmFjU8ShDtM6t2VT15V5I0rsRAGRc5RZPs1OE8Vi3aozMxTjxatgWYLmnk0ozVyDVudpWURh7xhqtqs9M9vv_jAfqIUgHIwFe1MVURVaxrV4jOsccyGYHvJ8ZLFmpzrqf8LPPa2N3M1SG-T4vUDhsLgjgeIkfOC6_U3zZBVNKUY8HU47JaiTLjHHnOMHfzeA4iz76Sb2MQ" method="POST" enctype="application/x-www-form-urlencoded">
<input type="hidden" name="r" value="c77fde06d76dccbdf1aa275a6824657ec7878994-1602452009-0-AefHkw7YBHV4yapfdyGgNFofr2bk+ZNLsmu1vxzyTAyFPQickf2DVbsdFnOKYI9Zs5D6PO21kZcj5siVtnYOhmEJ7HOBLBCp4lS+GBW8iyR62pXG9ezmP6Fu4qRomUkK8uCSsqveohhquzDEYroSgMpZT0eIJXFIprAfC6uIux7NSx6mo8wGMKFoW3TJJFmAN4FKgZdHpkLShowC8AaRocTx86yZzOOrEywJ5CGsOzw5vNg4GvS4gK6MB+pR3iKfGRnXamisWHrWYZWDyfiGHOfcD8LmcCWzeIEMfD+nADV4477P2jWOHIDvEqtS7Yi0G3qKvH16LmR28qALhOLv8PAhv2GBzp8EOUcdXkJfFN1Jloqm5JU2eoCn/5uBxE0xl80s8Xfaa9vhkhqRicv3XnmHpJRhXgNvauGiYLcmaJ0189RtB6eEhZ6j1N9o9pfstDcSa00ur7vPLgDCd2AqiVrVz8SG8zb+8L+wlfrTaBCIlAiecjoTFLHTPEZW2V4eaVYzY9ECAb69YOhnGBhUXDiDk8wjSLZv8uZYMIxwW+jEsdzAtJ9TkMq5VXrE/sORd24lamS6K3Lr8g9BasZTjJdR3Omni9UmlQVaVDXUIPQBAb6x1nhf57/47lvWjDgrjuEw47NDosN3IHSDoyKYUMg="/>
<input type="hidden" value="3715604b2b146b25182bb17d479ebda2" id="jschl-vc" name="jschl_vc"/>
<!-- <input type="hidden" value="" id="jschl-vc" name="jschl_vc"/> -->
<input type="hidden" name="pass" value="1602452013.272-qzCPIXiuVG"/>
<input type="hidden" id="jschl-answer" name="jschl_answer"/>
</form>
<div id="trk_jschal_nojs" style="background-image:url('/cdn-cgi/images/trace/jschal/nojs/transparent.gif?ray=5e0bb321ef7bca98')"> </div>
</div>
<div class="attribution">
DDoS protection by <a href="https://www.cloudflare.com/5xx-error-landing/" target="_blank">Cloudflare</a>
<br/>
<span class="ray_id">Ray ID: <code>5e0bb321ef7bca98</code></span>
</div>
</td>
</tr>
</tbody></table>
</body></html>
teamskeet.com not working
Teamskeet only works for a single query and then cloudflare blocks the ip i think. Not much can be done
for javlibrary with the last update you can change the url to one of the mirrors and it should work
Vixen Network sites now require you to login when opening a scene page, thus the scraper no longer works.
Vixen Network sites now require you to login when opening a scene page, thus the scraper no longer works.
Already solved in discord, but for other people:
If you are in performer page, the link to the scene will have members.
in the ULR (https://members.tushy.com/inauguration
)
Just remove the members.
to get to the scene. 😃
teenfidelity.com doesn't work (part of /scrapers/KellyMadisonMedia.yml) the comment states the first scraping attempt should set a cookie, the second attempt should work, but it doesn't
@Threak are you sure you setup cdp correctly? Just tried and it seems to work. The first request has something to do with their site protection not necessary a cookie. You can append this at the end of the scraper file , refresh the scrapers
debug:
printHTML: true
and have a look at the log so that you can see what the site returns to stash.
@bnkai I think there is a difference between headless chromium and using normal chrome. @Threak What CDP do you use, Headless chrome or a chromium executable ?
I use a chromium executable and this scraper don't work for me like Teamskeet scraper, so i think there is a difference between headless and classic.
@Belleyy you might be right I am using a headless chrome docker container so that might be it. Teamskeet works only for 1-2 queries max but teendfidelity works ok after first query
@Belleyy upon futher investigation it seems that the docker container method maintains some cookies which i assume the executable one doesn't. @Belleyy @Threak can you try the the stash version from this PR https://github.com/stashapp/stash/pull/934 (download links below) with this scraper file https://pastebin.com/UBuHFkfm? (make sure to removethe old scraper file )
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 46 100 46 0 0 35 0 0:00:01 0:00:01 --:--:-- 35
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 43.7M 100 157 100 43.7M 6 1890k 0:00:26 0:00:23 0:00:03 206k
stash-osx uploaded to url: "https://gofile.io/d/lq6J3w"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 46 100 46 0 0 35 0 0:00:01 0:00:01 --:--:-- 35
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 40.5M 100 161 100 40.5M 39 9.9M 0:00:04 0:00:04 --:--:-- 9.9M
stash-win.exe uploaded to url: "https://gofile.io/d/DozNwz"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 46 100 46 0 0 29 0 0:00:01 0:00:01 --:--:-- 29
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 41.2M 100 159 100 41.2M 5 1376k 0:00:31 0:00:30 0:00:01 0
stash-linux uploaded to url: "https://gofile.io/d/5Tl0hi"
First make sure to set the log level to debug. Then do a scrape. After the scrape get the nats values that are printed in the log and replace in the yml file the Value: ""
entries. Do a refresh scrapers from stash and the scraper should work for pornfidelity. As a bonus you can set CDP to false as it no longer seems to be needed ( Use it first though to verify that all works ok with the plain chrome executable ) .
@bnkai Just tested with few scene and it work 👍 (With & Without CDP)
Edit: I just found that the chromium process was still in background, will try it more later to know if i was doing something wrong or it's a issue to your PR. Can't reproduce it 🤷♂️ .
@Belleyy this seems to verify what i thought , we'll probably have to update the scraper to mention that a CDP remote instance is required (plain executable is not enough) till the cookies PR is merged.
I have following 4 errors regrading feild cookies.
time="2020-12-28T20:50:20+05:30" level=error msg="Error loading scraper C:\\Users\\...\\.stash\\scrapers\\Colette.yml: yaml: unmarshal errors:\n line 63: field cookies not found in type scraper.scraperDriverOptions"
time="2020-12-28T20:50:20+05:30" level=error msg="Error loading scraper C:\\Users\\...\\.stash\\scrapers\\KellyMadisonMedia.yml: yaml: unmarshal errors:\n line 42: field cookies not found in type scraper.scraperDriverOptions"
time="2020-12-28T20:50:20+05:30" level=error msg="Error loading scraper C:\\Users\\...\\.stash\\scrapers\\javdb.yml: yaml: unmarshal errors:\n line 89: field cookies not found in type scraper.scraperDriverOptions"
time="2020-12-28T20:50:20+05:30" level=error msg="Error loading scraper C:\\Users\\...\\.stash\\scrapers\\mgstage.yml: yaml: unmarshal errors:\n line 29: field cookies not found in type scraper.scraperDriverOptions"
@JDRanpariya Are you using the dev build ? This scraper need a version of stash >= v0.4.0-14.
I'm using following build https://github.com/stashapp/stash/releases/tag/v0.4.0
The 24 Nov one
@JDRanpariya you need to switch to a recent dev version as stated in the scrapers list v0.4.0-14 at least for cookie support. The one you have doesnt support that as its v0.4.0 ( 14 commits older that what you need)
clips4sale.com, Clips4sale.yml, isn't fully working anymore. Atm it only finds image and details.
Added a quick fix for clips4sale there #352 Any feedback there is appreciated as i didnt test with a lot of scenes.
dreamtranny scraper is broken, gives an error saying "certificate signed by unknown authority"
https://www.sslshopper.com/ssl-checker.html#hostname=https://dreamtranny.com its not the scraper but the certificate of the site that has issues. I would try either adding the certicate to your trusted ones or using cdp with the scraper for the moment. If you choose the CDP method adding this at the end of the scraper should be ok
driver:
useCDP: true
You'll have to copy the image from the UI though as the scraper uses the go native client to do get the image.
Kink.yml details selector is missing everything in lists.
Examples: https://www.kink.com/shoot/5116 https://www.kink.com/shoot/5343
We switched to @Belleyy 's python scraper for Teamskeet so it should now work properly.
exploitedcollegegirls.com, ExCoGi.yml isn't working No errors given, just fails to scrape anything
Allanal.com, nympho.yml also not working
@malibustacynewhat thanks for the report, there is an open PR for excogi
As for nympho/allanal Be sure to set a User Agent in Settings!
as mentioned in the scraper. Both sites seem to work fine for me.
@bnkai Yeah, it was the User agent, sorry about that.
R18.com seems to be broken. Maybe something to do with a change on their backend? Maybe some sitewide changes (I think maybe because major studios are pulling all titles from them starting 03-31-21) I'm looking around the trace and debug logs but I don't really see anything. It just fails. I can post more if someone can direct me where to find the output is.
Edit: Previous releases I used the scraper to grab info for no longer work either
R18 requires to set a user agent in config, probably recent change.
It seems that MindGeekAPI scraper is also not working, I get the following error log.
ERRO[0013] scraper error when running command
Edit: I tried it with different studios but none worked for scenes.
It seems that MindGeekAPI scraper is also not working, I get the following error log.
ERRO[0013] scraper error when running command : File "MindGeekAPI.py", line 36 print(q, file=sys.stderr) ^ SyntaxError: invalid syntax
Edit: I tried it with different studios but none worked for scenes.
I think you use python2, you need Python3 for it. Try editing MindGeekAPI.yml and change python
to python3
Be sure to have the request module too.
Well, thanks for the info, I did everything you said but it now says that MindGeekAPI.ini is missing and I couldn't find the said file in the scrapers folder
Well, thanks for the info, I did everything you said but it now says that MindGeekAPI.ini is missing and I couldn't find the said file in the scrapers folder
Yea i guess i need to explain more in my script... Here how it work:
https://www.brazzers.com/video/4423961/workout-to-squirt-out
)If you want search on realitykings, babes... You need to have 1 URL scraped from them.
Hi, transsensual.com isn't working properly, it can't parse Title, Performers and Image.
Hi, transsensual.com isn't working properly, it can't parse Title, Performers and Image.
I'll make a PR for that thanks. For the time being you can use the MingGeekApi scraper instead (its better/more complete anyway) EDIT: PR available https://github.com/stashapp/CommunityScrapers/pull/440
IAFD seems that it now uses more aggressive CF protection. The scraper request triggers the CF captcha even if we use CDP
It seems the change made to ThePornDB.yml scraper introduced a new error. The log now indicates the following:
Error loading scraper C:\Users******.stash\scrapers\ThePornDB.yml: yaml: line 97: mapping values are not allowed in this context
It is no longer possible to even see the scraper when editing scenes.
As mentioned in the channel to anyone downloading from the repo follow the instructions. Either download the zip containing all the repo or click on each scraper, click on view raw and then file -> save as. Do not right click and save links as.
Paco.yml needs to be updated. The website was recently updated ~1 month ago and it appears to have broken the scraper for it.
Updated Paco , you can try it here https://github.com/stashapp/CommunityScrapers/pull/513 till its merged
ThePornDB.yml: metadataapi.net api changed, "parse" no longer delivers tags. if i understand it right first parse to find the scene, then use the id / url to geht the rest of the scene data
ThePornDB.yml: metadataapi.net api changed, "parse" no longer delivers tags. if i understand it right first parse to find the scene, then use the id / url to geht the rest of the scene data
The scraper was updated 4 days ago https://github.com/stashapp/CommunityScrapers/pull/509 for the tag issue, are you sure you have latest version?
Any issues with scrapers not working should be mentioned here The name of the scraper, the xpath or part not working would be appretiated.
Known Issues
nhentai scraper is broken ( blocked/detected by site / CF ?)
updated 2022-09-25