stashapp / CommunityScrapers

This is a public repository containing scrapers created by the Stash Community.
GNU Affero General Public License v3.0
625 stars 409 forks source link

Broken Scrapers #123

Closed bnkai closed 2 months ago

bnkai commented 3 years ago

Any issues with scrapers not working should be mentioned here The name of the scraper, the xpath or part not working would be appretiated.

Known Issues

updated 2022-09-25

Belleyy commented 3 years ago

Look like the JavLibrary scraper can be broken sometimes. You get the DDOS Protection Cloudflare that block it (You normally need to wait 5sec to be redirected to the site.) I try with useCDP don't fix it.

Idea: Javlibrary have mirror/clone, maybe it would be good to have a option if it's fail, it change the url and try with these site. Exemple all are the same:

But i don't think it would be useful for other scraper.

bnkai commented 3 years ago

@brumouta thanks for the feedback welivetogether,babes now are moved to a separate one edit added momsbang,momslickteens and propertysex also

budislov commented 3 years ago

RealityKings has some more broken domains:,, and only parse the image. Will work fine if they are moved to RealityKingsOL

bnkai commented 3 years ago

Thanks for the feedback @budislov The relevant scrapers have been updated

budislov commented 3 years ago

Looks like RealityKingsOL is broken. Tried to scrap from both and and only the tags came through. It appears that the div classes used in the scrapper have changed. Will investigate further.

bnkai commented 3 years ago

Pending PR is available for RealityKingsOL and Brazzers relevant PRs merged

Ziatexataor commented 3 years ago performer scraper not working

bnkai commented 3 years ago

IAFD fixed , thanks for the report @Ziatexataor and for the fix @Belleyy

malibustacynewhat commented 3 years ago

TransSensual.yml seems to be broken. Tested with new and older scenes and can't pull the data

bnkai commented 3 years ago

@malibustacynewhat thanks for the report The relevant PR by @Belleyy fixes the issue

mmenanno commented 3 years ago

JAVLibrary is broken

Looks to be a Cloudflare error but using the CDP driver didn't resolve it for me when testing:

<!DOCTYPE html><html lang="en-US"><head>
  <meta charset="UTF-8"/>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
  <meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1"/>
  <meta name="robots" content="noindex, nofollow"/>
  <meta name="viewport" content="width=device-width,initial-scale=1"/>
  <title>Just a moment...</title>
  <style type="text/css">
    html, body {width: 100%; height: 100%; margin: 0; padding: 0;}
    body {background-color: #ffffff; color: #000000; font-family:-apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, "Helvetica Neue",Arial, sans-serif; font-size: 16px; line-height: 1.7em;-webkit-font-smoothing: antialiased;}
    h1 { text-align: center; font-weight:700; margin: 16px 0; font-size: 32px; color:#000000; line-height: 1.25;}
    p {font-size: 20px; font-weight: 400; margin: 8px 0;}
    p, .attribution, {text-align: center;}
    #spinner {margin: 0 auto 30px auto; display: block;}
    .attribution {margin-top: 32px;}
    @keyframes fader     { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
    @-webkit-keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
    #cf-bubbles > .bubbles { animation: fader 1.6s infinite;}
    #cf-bubbles > .bubbles:nth-child(2) { animation-delay: .2s;}
    #cf-bubbles > .bubbles:nth-child(3) { animation-delay: .4s;}
    .bubbles { background-color: #f58220; width:20px; height: 20px; margin:2px; border-radius:100%; display:inline-block; }
    a { color: #2c7cb0; text-decoration: none; -moz-transition: color 0.15s ease; -o-transition: color 0.15s ease; -webkit-transition: color 0.15s ease; transition: color 0.15s ease; }
    a:hover{color: #f4a15d}
    .attribution{font-size: 16px; line-height: 1.5;}
    .ray_id{display: block; margin-top: 8px;}
    #cf-wrapper #challenge-form { padding-top:25px; padding-bottom:25px; }
    #cf-hcaptcha-container { text-align:center;}
    #cf-hcaptcha-container iframe { display: inline-block;}

    <meta http-equiv="refresh" content="12"/>
<script type="text/javascript">

      cvId: "1",
      cType: "non-interactive",
      cNounce: "90957",
      cRay: "5e0bb321ef7bca98",
      cHash: "da202b537a470c2",
      cFPWv: "g",
      cRq: {
        ru: "aHR0cDovL3d3dy5qYXZsaWJyYXJ5LmNvbS9lbi8/dj1qYXZtZXpiZTNh",
        ra: "TW96aWxsYS81LjAgKE1hY2ludG9zaDsgSW50ZWwgTWFjIE9TIFggMTBfMTVfNSkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzgzLjAuNDEwMy4xMDYgU2FmYXJpLzUzNy4zNg==",
        rm: "R0VU",
        d: "q4jiR7WSBtf4fLzLz9igfZOdIxwSKG18lkM8oKJ2oB8n30GM2iyW8aiQ9atzUZsOBOiOCY1F45Ok0xoQE9LhBiZfXlfVJaHdOBUlqNu1cCbboEIdvJX1FuypXHYYwXjfaKTC2p4xeTL5nAkfqvaQqkt1H/1p0rqFLGuv5JXJ3gBxB6Y/uALdxdsFi+lSlCG6Qe3X2Lj+WYyKl3todU7QjK8vUNythAJOrMTlR1fGrfbfXESvY4tSMJo7OEhwZymfB+AKhpzlHeTcuo+T40qfUHcXUDFRZCqSIvBynJ532Jn2bbqiZ1XffuBhRCVhBxK+kkJ9NurfuchvBr0bA3lk+Dnyykdr0hUr5lE34hioN0t6bDwXnGSBMCsX40Hx6TDDQa+utstnZqYk3G1jtYupvATJXzjvxhaNDHgOwHJomiUip/glK6aw52FuNwxXEj7ZJmdJPg4omti3B/1l7wy5+Z1rERc/nHgZE2JBxsOMDFpFXx6oNX/ZCk1//+mIVxGVFfNCBIGI1eyIKCP6LkCcsw1+aeO2YHmOzBkz9Ebx3drg5ouDQU0bmnNNsuh6vtMZ2eydA3b8y1H2mfO+UoUwB7Ej5u0cR1gJGbuSHpK+imsOFpqmwJdDPhqXYl5xcy6nVCnU2xeyqXJP/HMHGjU4h3Op/vlZKIuhtqFPC6Guk0FIUbFTI4JGMG7u3UwcuuYUrnmYXFX1vupeVrqsjsRFJXnqRhnWc+EJ62b3QYIqf/pFpb/eKU8DpE4wKEmd05vkzLCS1DZQ29AxACho6Zf0brScVV2/qvY5qVsNlk9QCSJdmmR7eyfAPju4BoRmFWdRVEymwQHM7raS1XGdZvcFDw==",
        t: "MTYwMjQ1MjAwOS4yNzIwMDA=",
        m: "jAJ8FygcOeJXMeDg2+r+pIbPCZvv7uD3AA/cCQ2MIkQ=",
        i1: "2tfaQpq68/qtCUW9AL9YZA==",
        i2: "DT4KsCiUsfsu8FZXKRmHjg==",
        uh: "TprDV0CpLyfpdzs+8x+WX/Btsv1e+OQLx8NzEGjSfMY=",
        hh: "3htzUBXaqug0moZaVaRPWNYG1rRQQxdDndKhxQafs0M=",
    window._cf_chl_enter = function(){window._cf_chl_opt.p=1};

    var a = function() {try{return !!window.addEventListener} catch(e) {return !1} },
    b = function(b, c) {a() ? document.addEventListener("DOMContentLoaded", b, c) : document.attachEvent("onreadystatechange", b)};
      var cookiesEnabled=(navigator.cookieEnabled)? true : false;
      var cookieSupportInfix=cookiesEnabled?'/nocookie':'/cookie';
      var a = document.getElementById('cf-content'); = 'block';
      var isIE = /(MSIE|Trident\/|Edge\/)/i.test(window.navigator.userAgent);
      var trkjs = isIE ? new Image() : document.createElement('img');
      trkjs.setAttribute("src", "/cdn-cgi/images/trace/jschal/js"+cookieSupportInfix+"/transparent.gif?ray=5e0bb321ef7bca98"); = "trk_jschal_js";
      trkjs.setAttribute("alt", "");

      var cpo = document.createElement('script');
      cpo.type = 'text/javascript';
      cpo.src = "/cdn-cgi/challenge-platform/h/g/orchestrate/jsch/v1";
      var done = false;
      cpo.onload = cpo.onreadystatechange = function() {
        if (!done && (!this.readyState || this.readyState === "loaded" || this.readyState === "complete")) {
          done = true;
          cpo.onload = cpo.onreadystatechange = null;

    }, false);

  <div style="display: none;"><a href="">table</a></div><table width="100%" height="100%" cellpadding="20">
      <td align="center" valign="middle">
          <div class="cf-browser-verification cf-im-under-attack">
    <h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>
  <div id="cf-content" style="display:none">

    <div id="cf-bubbles">
      <div class="bubbles"></div>
      <div class="bubbles"></div>
      <div class="bubbles"></div>
    <h1><span data-translate="checking_browser">Checking your browser before accessing</span></h1>

    <div id="no-cookie-warning" data-translate="turn_on_cookies" style="display:none">
      <p data-translate="turn_on_cookies" style="color:#bd2426;">Please enable Cookies and reload the page.</p>
    <p data-translate="process_is_automatic">This process is automatic. Your browser will redirect to your requested content shortly.</p>
    <p data-translate="allow_5_secs">Please allow up to 5 seconds…</p>

  <form class="challenge-form" id="challenge-form" action="/en/?v=javmezbe3a&amp;__cf_chl_jschl_tk__=c44b146f044ddd9d0b23bf4928759e99e7ddef0e-1602452009-0-Ab8hTl3noYmOwwAWI1D0d_6zhaYO-4vHBJD8JW4VCFmZKjqal-xVCdpCdbztfKStCEp8QJa2ganoOGB_Jnq-Qwtu6BnG7zySJxaY_Oc54OgSHPG3Mt1wJ-nYfmFjU8ShDtM6t2VT15V5I0rsRAGRc5RZPs1OE8Vi3aozMxTjxatgWYLmnk0ozVyDVudpWURh7xhqtqs9M9vv_jAfqIUgHIwFe1MVURVaxrV4jOsccyGYHvJ8ZLFmpzrqf8LPPa2N3M1SG-T4vUDhsLgjgeIkfOC6_U3zZBVNKUY8HU47JaiTLjHHnOMHfzeA4iz76Sb2MQ" method="POST" enctype="application/x-www-form-urlencoded">
    <input type="hidden" name="r" value="c77fde06d76dccbdf1aa275a6824657ec7878994-1602452009-0-AefHkw7YBHV4yapfdyGgNFofr2bk+ZNLsmu1vxzyTAyFPQickf2DVbsdFnOKYI9Zs5D6PO21kZcj5siVtnYOhmEJ7HOBLBCp4lS+GBW8iyR62pXG9ezmP6Fu4qRomUkK8uCSsqveohhquzDEYroSgMpZT0eIJXFIprAfC6uIux7NSx6mo8wGMKFoW3TJJFmAN4FKgZdHpkLShowC8AaRocTx86yZzOOrEywJ5CGsOzw5vNg4GvS4gK6MB+pR3iKfGRnXamisWHrWYZWDyfiGHOfcD8LmcCWzeIEMfD+nADV4477P2jWOHIDvEqtS7Yi0G3qKvH16LmR28qALhOLv8PAhv2GBzp8EOUcdXkJfFN1Jloqm5JU2eoCn/5uBxE0xl80s8Xfaa9vhkhqRicv3XnmHpJRhXgNvauGiYLcmaJ0189RtB6eEhZ6j1N9o9pfstDcSa00ur7vPLgDCd2AqiVrVz8SG8zb+8L+wlfrTaBCIlAiecjoTFLHTPEZW2V4eaVYzY9ECAb69YOhnGBhUXDiDk8wjSLZv8uZYMIxwW+jEsdzAtJ9TkMq5VXrE/sORd24lamS6K3Lr8g9BasZTjJdR3Omni9UmlQVaVDXUIPQBAb6x1nhf57/47lvWjDgrjuEw47NDosN3IHSDoyKYUMg="/>
    <input type="hidden" value="3715604b2b146b25182bb17d479ebda2" id="jschl-vc" name="jschl_vc"/>
    <!-- <input type="hidden" value="" id="jschl-vc" name="jschl_vc"/> -->
    <input type="hidden" name="pass" value="1602452013.272-qzCPIXiuVG"/>
    <input type="hidden" id="jschl-answer" name="jschl_answer"/>

  <div id="trk_jschal_nojs" style="background-image:url(&#39;/cdn-cgi/images/trace/jschal/nojs/transparent.gif?ray=5e0bb321ef7bca98&#39;)"> </div>

          <div class="attribution">
            DDoS protection by <a href="" target="_blank">Cloudflare</a>
            <span class="ray_id">Ray ID: <code>5e0bb321ef7bca98</code></span>


Ziatexataor commented 3 years ago not working

bnkai commented 3 years ago

Teamskeet only works for a single query and then cloudflare blocks the ip i think. Not much can be done

for javlibrary with the last update you can change the url to one of the mirrors and it should work

SpedNSFW commented 3 years ago

Vixen Network sites now require you to login when opening a scene page, thus the scraper no longer works.

Belleyy commented 3 years ago

Vixen Network sites now require you to login when opening a scene page, thus the scraper no longer works.

Already solved in discord, but for other people: If you are in performer page, the link to the scene will have members. in the ULR ( Just remove the members. to get to the scene. 😃

Threak commented 3 years ago doesn't work (part of /scrapers/KellyMadisonMedia.yml) the comment states the first scraping attempt should set a cookie, the second attempt should work, but it doesn't

bnkai commented 3 years ago

@Threak are you sure you setup cdp correctly? Just tried and it seems to work. The first request has something to do with their site protection not necessary a cookie. You can append this at the end of the scraper file , refresh the scrapers

  printHTML: true

and have a look at the log so that you can see what the site returns to stash.

Belleyy commented 3 years ago

@bnkai I think there is a difference between headless chromium and using normal chrome. @Threak What CDP do you use, Headless chrome or a chromium executable ?

I use a chromium executable and this scraper don't work for me like Teamskeet scraper, so i think there is a difference between headless and classic.

bnkai commented 3 years ago

@Belleyy you might be right I am using a headless chrome docker container so that might be it. Teamskeet works only for 1-2 queries max but teendfidelity works ok after first query

bnkai commented 3 years ago

@Belleyy upon futher investigation it seems that the docker container method maintains some cookies which i assume the executable one doesn't. @Belleyy @Threak can you try the the stash version from this PR (download links below) with this scraper file (make sure to removethe old scraper file )

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100    46  100    46    0     0     35      0  0:00:01  0:00:01 --:--:--    35

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100 43.7M  100   157  100 43.7M      6  1890k  0:00:26  0:00:23  0:00:03  206k

stash-osx uploaded to url: ""

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100    46  100    46    0     0     35      0  0:00:01  0:00:01 --:--:--    35

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100 40.5M  100   161  100 40.5M     39   9.9M  0:00:04  0:00:04 --:--:--  9.9M

stash-win.exe uploaded to url: ""

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100    46  100    46    0     0     29      0  0:00:01  0:00:01 --:--:--    29

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100 41.2M  100   159  100 41.2M      5  1376k  0:00:31  0:00:30  0:00:01     0

stash-linux uploaded to url: ""

First make sure to set the log level to debug. Then do a scrape. After the scrape get the nats values that are printed in the log and replace in the yml file the Value: "" entries. Do a refresh scrapers from stash and the scraper should work for pornfidelity. As a bonus you can set CDP to false as it no longer seems to be needed ( Use it first though to verify that all works ok with the plain chrome executable ) .

Belleyy commented 3 years ago

@bnkai Just tested with few scene and it work 👍 (With & Without CDP)

Edit: I just found that the chromium process was still in background, will try it more later to know if i was doing something wrong or it's a issue to your PR. Can't reproduce it 🤷‍♂️ .

bnkai commented 3 years ago

@Belleyy this seems to verify what i thought , we'll probably have to update the scraper to mention that a CDP remote instance is required (plain executable is not enough) till the cookies PR is merged.

JDRanpariya commented 3 years ago

I have following 4 errors regrading feild cookies. time="2020-12-28T20:50:20+05:30" level=error msg="Error loading scraper C:\\Users\\...\\.stash\\scrapers\\Colette.yml: yaml: unmarshal errors:\n line 63: field cookies not found in type scraper.scraperDriverOptions"

time="2020-12-28T20:50:20+05:30" level=error msg="Error loading scraper C:\\Users\\...\\.stash\\scrapers\\KellyMadisonMedia.yml: yaml: unmarshal errors:\n line 42: field cookies not found in type scraper.scraperDriverOptions"

time="2020-12-28T20:50:20+05:30" level=error msg="Error loading scraper C:\\Users\\...\\.stash\\scrapers\\javdb.yml: yaml: unmarshal errors:\n line 89: field cookies not found in type scraper.scraperDriverOptions"

time="2020-12-28T20:50:20+05:30" level=error msg="Error loading scraper C:\\Users\\...\\.stash\\scrapers\\mgstage.yml: yaml: unmarshal errors:\n line 29: field cookies not found in type scraper.scraperDriverOptions"

Belleyy commented 3 years ago

@JDRanpariya Are you using the dev build ? This scraper need a version of stash >= v0.4.0-14.

JDRanpariya commented 3 years ago

I'm using following build

The 24 Nov one

bnkai commented 3 years ago

@JDRanpariya you need to switch to a recent dev version as stated in the scrapers list v0.4.0-14 at least for cookie support. The one you have doesnt support that as its v0.4.0 ( 14 commits older that what you need)

nocrad349 commented 3 years ago, Clips4sale.yml, isn't fully working anymore. Atm it only finds image and details.

bnkai commented 3 years ago

Added a quick fix for clips4sale there #352 Any feedback there is appreciated as i didnt test with a lot of scenes.

malibustacynewhat commented 3 years ago

dreamtranny scraper is broken, gives an error saying "certificate signed by unknown authority"

bnkai commented 3 years ago its not the scraper but the certificate of the site that has issues. I would try either adding the certicate to your trusted ones or using cdp with the scraper for the moment. If you choose the CDP method adding this at the end of the scraper should be ok

  useCDP: true

You'll have to copy the image from the UI though as the scraper uses the go native client to do get the image.

nocrad349 commented 3 years ago

Kink.yml details selector is missing everything in lists.


bnkai commented 3 years ago

We switched to @Belleyy 's python scraper for Teamskeet so it should now work properly.

malibustacynewhat commented 3 years ago, ExCoGi.yml isn't working No errors given, just fails to scrape anything, nympho.yml also not working

bnkai commented 3 years ago

@malibustacynewhat thanks for the report, there is an open PR for excogi As for nympho/allanal Be sure to set a User Agent in Settings! as mentioned in the scraper. Both sites seem to work fine for me.

malibustacynewhat commented 3 years ago

@bnkai Yeah, it was the User agent, sorry about that.

xxtensazenxx commented 3 years ago seems to be broken. Maybe something to do with a change on their backend? Maybe some sitewide changes (I think maybe because major studios are pulling all titles from them starting 03-31-21) I'm looking around the trace and debug logs but I don't really see anything. It just fails. I can post more if someone can direct me where to find the output is.

Edit: Previous releases I used the scraper to grab info for no longer work either

bnkai commented 3 years ago

R18 requires to set a user agent in config, probably recent change.

canocano4145 commented 3 years ago

It seems that MindGeekAPI scraper is also not working, I get the following error log.

ERRO[0013] scraper error when running command : File "", line 36 print(q, file=sys.stderr) ^ SyntaxError: invalid syntax

Edit: I tried it with different studios but none worked for scenes.

Belleyy commented 3 years ago

It seems that MindGeekAPI scraper is also not working, I get the following error log.

ERRO[0013] scraper error when running command : File "", line 36 print(q, file=sys.stderr) ^ SyntaxError: invalid syntax

Edit: I tried it with different studios but none worked for scenes.

I think you use python2, you need Python3 for it. Try editing MindGeekAPI.yml and change python to python3 Be sure to have the request module too.

canocano4145 commented 3 years ago

Well, thanks for the info, I did everything you said but it now says that MindGeekAPI.ini is missing and I couldn't find the said file in the scrapers folder

Belleyy commented 3 years ago

Well, thanks for the info, I did everything you said but it now says that MindGeekAPI.ini is missing and I couldn't find the said file in the scrapers folder

Yea i guess i need to explain more in my script... Here how it work:

If you want search on realitykings, babes... You need to have 1 URL scraped from them.

vorrac-stash commented 3 years ago

Hi, isn't working properly, it can't parse Title, Performers and Image.

bnkai commented 3 years ago

Hi, isn't working properly, it can't parse Title, Performers and Image.

I'll make a PR for that thanks. For the time being you can use the MingGeekApi scraper instead (its better/more complete anyway) EDIT: PR available

bnkai commented 3 years ago

IAFD seems that it now uses more aggressive CF protection. The scraper request triggers the CF captcha even if we use CDP

Soundchazer commented 3 years ago

It seems the change made to ThePornDB.yml scraper introduced a new error. The log now indicates the following:

Error loading scraper C:\Users******.stash\scrapers\ThePornDB.yml: yaml: line 97: mapping values are not allowed in this context

It is no longer possible to even see the scraper when editing scenes.

bnkai commented 3 years ago

As mentioned in the channel to anyone downloading from the repo follow the instructions. Either download the zip containing all the repo or click on each scraper, click on view raw and then file -> save as. Do not right click and save links as.

magma2017 commented 3 years ago

Paco.yml needs to be updated. The website was recently updated ~1 month ago and it appears to have broken the scraper for it.

bnkai commented 3 years ago

Updated Paco , you can try it here till its merged

ferengi82 commented 3 years ago

ThePornDB.yml: api changed, "parse" no longer delivers tags. if i understand it right first parse to find the scene, then use the id / url to geht the rest of the scene data

bnkai commented 3 years ago

ThePornDB.yml: api changed, "parse" no longer delivers tags. if i understand it right first parse to find the scene, then use the id / url to geht the rest of the scene data

The scraper was updated 4 days ago for the tag issue, are you sure you have latest version?