xbapps / xbvr

Tool to organize and stream your VR porn library
331 stars 129 forks source link

Problems matching scenes to files #1092

Open mark-hahn opened 1 year ago

mark-hahn commented 1 year ago

<For some bizzare reason this original post was erased. It was about a problem matching files to scenes. The problem is detailed again in the 3rd post below>

theRealKLH commented 1 year ago

because... SLRSexBabes is NOT the studio. The studio in your example is SexBabesVR.

mark-hahn commented 1 year ago

I had mistyped SexBabesVR as SLRSexBabes in my original post. I closed this issue because I thought I'd let this drop until I did more homework. I've done that and I still have the problem. Here are details on the file/scene not matching ...

Filename:     SLR_SexBabesVR_Pure Bliss_original_12851_LR_180.mp4
Studio name:  SexBabesVR 

I have enabled and scraped SexBabesVR multiple times. I've rescanned the files. I just cannot get any match. This is the HTML for the entry of the scene in the SLR website. I assume this is what you scrape ...

<article
  class="c-grid-item c-grid-item--search u-block u-relative u-px--two u-relative u-float--l u-width--full js-c-grid-item"
  data-qa="grid-item-search">
  <div class="u-clear u-py--two desktop:u-py--four u-block">
    <div class="u-width--full u-mr--two desktop:u-width--two-hnd-ei-five desktop:u-mr--four desktop:u-float--l">
      <div class="u-absolute u-z--one u-pos--t u-pos--l u-mt--four u-ml--four desktop:u-mt--six js-c-grid-badges"
        data-qa="grid-item-search-badge">
      </div>
      <div class="c-grid-ratio u-ratio u-ratio--grid u-ratio--top u-bgc--bl" data-qa="grid-item-search-thumbs">
        <a href="/scenes/pure-bliss-12851" class="u-block u-absolute u-stretch" data-qa="grid-item-search-thumb-link">
          <img class="u-ratio-item  u-transition--video ls-is-cached lazyloaded"
            src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw=="
            data-srcset="https://cdn-vr.sexlikereal.com/images/12851/vr-porn-Pure-Bliss-cover-app.jpg"
            data-videosrc="https://cdn-vr.sexlikereal.com/preview/14x1/12851_300p.mp4" data-videotype="video/mp4"
            alt="<u>Pure</u> Bliss; Lesbian blondes teen slim petite curvy real tits shaved pussy - SexBabesVR | SexLikeReal"
            data-qa="grid-item-search-thumb-img"
            srcset="https://cdn-vr.sexlikereal.com/images/12851/vr-porn-Pure-Bliss-cover-app.jpg"><video loop=""
            playsinline="" muted=""
            class="u-absolute u-stretch u-block u-c u-transition--video u-z--zero u-height--full u-width--full u-scale--video">
            <source src="https://cdn-vr.sexlikereal.com/preview/14x1/12851_300p.mp4" type="video/mp4"></video>
        </a>
        <span
          class="c-grid-video--availible u-block u-absolute u-pos--l u-pos--t u-disabled u-z--neg u-visibility--hidden"></span><span
          class="c-grid-video--progress u-absolute u-pos--l u-width--zero"></span></div>
    </div>
    <div class="u-mt--two desktop:u-float--l desktop:u-mt--zero">
      <div
        class="u-none u-nowrap u-ellipsis u-overflow--hidden u-mb--one desktop:u-block desktop:u-max-width--six-hundred desktop:u-nowrap-normal"
        data-qa="grid-item-search-title"><a href="/scenes/pure-bliss-12851"
          class="u-fs--tw u-fw--bold u-wh hover:u-lw u-transition--base u-lh--base u-pr--one desktop:u-fs--ei"><u>Pure</u>
          Bliss; Lesbian blondes teen slim petite curvy real tits shaved pussy - SexBabesVR | SexLikeReal</a></div>
      <div class="c-grid-item-footer u-mb--two desktop:u-mb--four" data-qa="grid-item-search-footer">
        <div class="u-relative">
          <div class="u-mt--half ">
            <div class="u-flex u-flex-justify--between u-align-i--center u-relative u-z--one">
              <a href="/scenes/pure-bliss-12851"
                class="c-grid-item-footer-title u-ellipsis u-nowrap u-fw--bold u-lh--base u-fs--fo u-wh u-block hover:u-lw u-transition--base u-overflow--hidden desktop:u-none"
                data-qa="grid-item-search-link-title"><u>Pure</u> Bliss; Lesbian blondes teen slim petite curvy real
                tits shaved pussy - SexBabesVR | SexLikeReal</a>
            </div>
            <div class="u-mb--one u-mt--one u-fw--medium u-fs--fo u-lh--base u-nowrap u-ellipsis">
              <a href="/studios/sexbabesvr" class="u-dw hover:u-wh u-transition--base u-inline-block u-align-y--m"
                data-qa="grid-item-search-link-studio">SexBabesVR</a>
              <div class="u-inline-block u-align-y--m u-dw">&nbsp;•&nbsp;&nbsp;3 years ago</div>
            </div>
            <div class="u-fw--medium u-height--twn-si u-overflow--hidden">
              <div class="u-inline-block u-align-y--m u-lh--one u-dw u-fs--fo">
                <button type="button" class="c-like o-btn c-like--text-btn o-btn--small2 u-px--zero   js-c-like3 "
                  data-qa="like-btn" data-entity-type="favorite" data-entity-id="12851" data-project-id="1"
                  data-message-added="'Pure Bliss' was added to your favorites"
                  data-message-removed="'Pure Bliss' was removed from your favorites"
                  data-icon-name-default="heart-outlined" data-icon-name-active="heart">
                  <!-- icon heart-outlined -->
                  <span class="o-icon c-like-icon o-icon--small u-mr--one js-c-like-icon">
                    <svg class="o-icon o-icon-svg" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
                      <use href="#icon-heart-outlined"></use>
                    </svg>
                  </span>
                  <span class="c-like-text u-align-y--m js-c-like-counter js-c-like-counter--k" data-text="1.4k"></span>
                </button>
                <button
                  class="c-playlist-watch-later-trigger c-playlist-watch-later-trigger--btn o-btn o-btn--small2 o-btn--text o-btn--squared js-m-tooltips js-c-playlist-save-to-watch-later"
                  data-qa="grid-item-save-to-watch-later" data-project-id="1" data-scene-id="12851">
                  <!-- icon bookmark-outlined -->
                  <span class="o-icon o-icon--small c-playlist-watch-later-trigger-icon">
                    <svg class="o-icon o-icon-svg" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
                      <use href="#icon-bookmark-outlined"></use>
                    </svg>
                  </span>
                  <!-- icon bookmark -->
                  <span class="o-icon o-icon--small c-playlist-watch-later-trigger-icon--active">
                    <svg class="o-icon o-icon-svg" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
                      <use href="#icon-bookmark"></use>
                    </svg>
                  </span>
                  <div
                    class="c-playlist-watch-later-trigger-text m-tooltips-content u-radius--two u-bgc--wh u-fs--tw u-fw--medium u-py--one u-px--two u-dg u-nowrap u-transform-none u-z--three"
                    data-popper-placement="top" data-text-save="Save to watch later"
                    data-text-remove="Remove from watch later">
                    <div class="m-tooltips-arrow" data-popper-arrow="">
                      <div class="u-bgc--wh"></div>
                    </div>
                  </div>
                </button> </div>
            </div>
          </div>
        </div>
      </div>
    </div>
  </div>
</article>

The scene shows up as gray in heresphere but plays when clicked. This makes sense.

tosh6072 commented 1 year ago

SexBabes is not scrapped from SLR, it is scrapped from SexbabesVR, so SexBabesVR Scenes downloaded from SLR will have to be matched manually. If you look in the Scene Details/Edit Scene (pencil icon) there is a tab for the Filenames that will match automatically.

mark-hahn commented 1 year ago

Ah, that makes sense but it is a real pita. Of the first 80 scenes downloaded 20 didn't match. Aggregators like SLR have a zillion scenes from other sites and manually matching all of them is a lot of work. This is a flaw in the system. I have two possible solutions that I would like to be considered feature requests.

1) Scan aggregators like SLR in addition to the original sites. 2) Use intelligent matching that looks for close matches between file names and scene names. Logic for close matching is readily available.

I hope I'm not coming off as a jerk. I make the suggestions as friendly advice.

BTW, I'm a retired programmer. Do you think pull requests in this area would be welcome? I have already used close-matching code in an app I wrote to match downloaded videos for movies and TV to file names.

toshski commented 1 year ago

I'm a retired programmer as well (unless something interesting comes along). As of 6 months ago, I had never used xbvr and had never developed using of the technology platforms it is built on. So, I think absolutely I think PRs would be welcome, I think I'm up to about 50 now across all sorts of area. New people also bring new ideas and perspectives, and there is certainly a variety of how people want to use the package.

I have done something along the lines of matching for myself. I match all scenes from SLR and VRPorn to xbvr scenes that are scraped from their original site. I would say my automated matching is about 98-99%. One of my first PRs was to change the Search to allow you to target specific searches for specific words to a specific field, i.e., site, cast, title or synopsis, which is the basis of my matching, I essentially build a huge search query with words from the other source, targeting those fields in xbvr, e.g., instead of just searching "Some scene title-An Actor" I search"title:Some title:scene title:title cast:an cast:actor". However, even with what I think is a very high match success rate, I'm not sure I'll every submit it. That 1 or 2% would translate to 1 or 2 posts a week on Discord asking why it didn't match. And that's just noise I don't want really want to deal with.

I don't think you would every get to 100%. There just so many exceptions e.g.

Don't get me wrong, it's a good idea, my main concern would be people realizing and accepting it's not 100% and not creating an extra support requirement. Ideally, a crowd source approach to correcting mismatches would work, but XBVR is all hosted locally with no remote/web based service where you could facilitate that. I would probably have tackled a solution, if I had time to also build a ui to allow people to sort their own mismatches, but I suspect that's as much work as the actual matching process.

I don't mean to seem like I'm rubbishing the idea or even saying don't do it, matching scenes between sites has huge benefits for other things as well, after all I have bult something for myself. Just highlighting some other considerations to think about.

mark-hahn commented 1 year ago

Wow. I had no idea things were such a mess. My pollyanna idea was that an aggregator scene was just a pointer to the original studio's scene. I had no idea it was manually copied. That eliminates my idea 2 of smart matching.

That leaves idea 1 which is to just scrape the aggregator like SLR and leave the original studio out of the equation. I just looked at the HTML I posted above and I noticed that the file name is not in there. The link is to /scenes/pure-bliss-12851 not SLR_SexBabesVR_Pure Bliss_original_12851_LR_180. But one could match by the number 12851 which would be exact matching. That would work, but only in SLR. Separate algorithms would need to be written for each aggragator site.

Idea 1 would be easier and more reliable than 2. But would the benefits outweigh the effort? I might code it for SLR and CzechVR because those are the only two I care about. Some other retired programmer could do the others :-) If a pull request is ignored it wouldn't be a big deal.

toshski commented 1 year ago

My 2 cents, if you have the time, give it a try, if it all works out you have a PR that benefits others. If not, you may still have something more specific that meets your own personal needs, but you still learn about how to make changes to xbvr which could be applied to other PR's.

mark-hahn commented 1 year ago

OK, things are worse than I thought. When I try to match, which I haven't tried before, there is no match. I rescanned the files and rescraped. I tested two files. Any idea what could cause this?

vt-idiot commented 1 year ago

I'm not sure what the problem is? Are you just clicking the "Match" button and leaving the actual search box that comes up as is? You might want to change the search query.

I took a SexBabesVR file with SexBabesVR's filename, SexBabesVR_Purely Temptatious_2700_493_LR_180.mp4 and renamed it to what I imagine SLR would call it, SLR_SexBabesVR_Purely Temptatious_2700p_29264_LR_180.mp4 based on what you've mentioned.

image

It was the first option. Repeating the match search with the search entry as just the scene title increases the match "score"

image

mark-hahn commented 1 year ago

I took a SexBabesVR file with SexBabesVR's filename, SexBabesVR_Purely Temptatious_2700_493_LR_180.mp4 and renamed it to what I imagine SLR would call it, SLR_SexBabesVR_Purely Temptatious_2700p_29264_LR_180.mp4

How did you know to add 29264? Did you somehow cheat? :-)

With the full file name I got five pages with no match. When I reduced the search to just "bliss" I got 2 pages of results down from 5, and neither page had bliss in the title. This is backwards to most searches which show more results as you reduce the search term. Eventually reducing it to empty should show all scenes. When I put in "pure bliss" I got the original 5 pages. This is baffling to me as it doesn't make sense. Could search be broken?

I guess I can watch scenes just based on the filenames with no other information. Bummer.

mark-hahn commented 1 year ago

I just had a thought (they are rare nowadays). I originally scanned the files and then tried to match. I realized later that I had never scraped SexBabesVR. After scraping it I rescanned the files. The match was still not found. Is it possible that scraping after file scanning creates a permanent mismatch? In other words the filenames are only matched to earlier scrapes?

toshski commented 1 year ago

The default download file was called, SLR_SexBabesVR_Purely Temptatious_2700p_29264_LR_180.mp4.

If I need to manually match SLR files, as you would with Sexbabesvr, you click match. At this point don't be hung up on the name of the file, the search term when you click Match is pre-populated with the filename but that just a convience thing to save some typing. Other than that, the filename is no longer relavant. Change the search term to find the scene you want, I usually get rid of everything from the filename except site and title, i.e. I would search "SexBabesVR Purely Temptatious", that usually enough to match, if not I may add the Actors name or prefix words from the title with title:. When you hit Match it's not about making the filename match, at this point auto matching based on filename has already failed, it's now just about finding the scene you want. Assuming you find it, as in the list in vt's example, you click Assign, xbvr will link that scene to your file, even though it wasn't in it's list of valid filename. It will add the filename to the list of valid filenames for that scene in your database.

If you have the scene scrapped and it's not coming up in the search results, then you may have a search index problem and should rebuild them in Options/Cache/Search Index/Reset

mark-hahn commented 1 year ago

if not I may add the Actors name

How can I know that if I've never seen the scene?

a search index problem and should rebuild them in Options/Cache/Search Index/Reset

I just clicked all buttons in that tab, rescanned, and rescraped. Now the search is totally broken. No results show up no matter what I put in the search box. Oh well ...

toshski commented 1 year ago

How can I know that if I've never seen the scene?

Do you mean how can you know the Actor, SLR list them on the scene where you downloaded it.

Now the search is totally broken. No results show up no matter what I put in the search box

It's probably still rebuilding the search indexes. They get locked when they are updating. A large system can take a couple of hours

mark-hahn commented 1 year ago

I waited 3 hrs with no luck. Now I've restarted the docker container and scanned and scraped again. Still no joy in mudville. I'm guessing the next step would be to destroy the container, reinstall it, rescan, and rescrape. I've spent enought time on this for now and will live with just filenames. I don't know what I could have done wrong to cause this.

vt-idiot commented 1 year ago

How did you know to add 29264? Did you somehow cheat? :-)

No lol, I just found the page for the scene on SLR, used the SLR/SBVR filename you previously mentioned, and the number I saw in the SLR URL: https://www.sexlikereal.com/scenes/purely-temptatious-29264

My file is actually from POVR, so I'd already renamed it to match SexBabesVR's own filenames to just have it match automatically as a test. It was originally named sex-babes-vr-purely-temptatious-katy-rose-180_180x180_3dh_LR.mp4 but since the actress name is in there, the scene itself was still the first result on the match screen.

@mark-hahn , I tried a test with the scene and filename you're having trouble with. The video itself was "filler", ignore the bizarre resolution, file size, etc., I just grabbed the nearest/smallest .mp4 file I had to use as a stand in since I don't have the scene itself downloaded.

image

The actual SexBabesVR filename matched on its own, as expected.

Where did you find what that should be called?

From within XBVR itself. "Vika P" doesn't have too many scenes anyways, so it wasn't hard to find, but do note that SBVR actually has her credited as "Aislin". I already had an "aka" in here though.

image

I removed some of the existing filenames (there were 6-7 more) just to make sure it'd show up in the screenshot, but if a scene is still giving you a hard time, you could always just click that blue "Add item+" button, and paste in whatever filename SLR is using, and then "Save Scene Details". It's a bit more roundabout than using the "Match" function, but the next time you scan the drive the file is on, it will just automatically match. That's all clicking "Match" technically does - but in one step. It adds a new filename to that list, and then associates the file with the scene.


As for my test with the "SLR filename"

image

The correct video is there in the list, it's the 2nd result.


Forgive me if you've already mentioned it, but you do already have the SexBabesVR scraper enabled, right?

image Screenshot doctored to avoid making it 6000 pixels tall.

I have enabled and scraped SexBabesVR multiple times.

Oh, indeed you have. Well, does the scene show up on the "Any" or "Not Downloaded" tabs if you filter by actress "Aislin" (or "Tiffany Tatum") and studio "SexBabesVR"?


Last but not least, although I think @toshski already mentioned it, you can always try resetting the Search Index

image

It will take some time to re-build it, and I'm probably mistaken, but I think it only triggers after you run a scraper (any scraper).


PS. If the scene is missing, then it's not scraping properly, but either way, importing this content bundle will plop the scene in your library, with the SLR filename already added to it, and the AKA for Aislin. xbvr-content-bundle-mark-hahn.json.txt Rename it, remove the .txt and you can import it like this: image

mark-hahn commented 1 year ago

Thanks. I will investigate the actions you suggest if/when search is working again.

DUZszyi commented 1 year ago

@mark-hahn I ran into something similar. In my download collection of years I have many files which I have changed the filenames of.

In some cases using a pattern like the CzechVR ones prefixed with cvr and CzechVR Fetish ones with cvrf. In other cases I prefixed a number representing a date. Some others I added tags in the filenames. So matching is a bit of a mess.

While being in a very pragmatic mood, I created this python script https://gist.github.com/DUZszyi/1f67e46d9ffda5ebec0e0db04d69b9a1

This opens the sqlite db and just pokes in the innards of xbvr, and it seems to work for me! There are three "matchers"

  1. based on substring. Either the target file starts with the stem of a known file or the stem of a known file is a substring in target. This matches the cases where I added stuff.
  2. special case for my cvr prefixing. might be caught by rule 1 too. I made rule 1 later :)
  3. generic Levenshtein distance. if less than 25% if the characters in target need insertions, deletions, or substitutions to get to a known file

Example of a match:

INFO:root:updating file 9171 to point to scene 27130 warm-pie-for-daddy-files-35286-oculus5k_180_LR.mp4 ~= warm-pie-for-daddy-files-oculus5k_180_LR.mp4
INFO:root:updating scene 27130 adding file to known files: warm-pie-for-daddy-files-35286-oculus5k_180_LR.mp4

warm-pie-for-daddy-files-35286-oculus5k_180_LR.mp4 is the file I have downloaded from virtualtaboo. It got matched to scene with pk 27130 because of Levenshtein distance to warm-pie-for-daddy-files-oculus5k_180_LR.mp4 (one of the files already known to that scene) is within threshold. Apparently that 35286 ID is no longer there or the scraper at least didn't get it like that (scraped today).

Example of a false positive:

INFO:root:updating file 9200 to point to scene 9810 BabeVR_Into_the_Azul_5k_180_180x180_3dh_LR.mp4 ~= babevr_under_the_tree_5k_180_180x180_3dh_LR.mp4
INFO:root:updating scene 9810 adding file to known files: BabeVR_Into_the_Azul_5k_180_180x180_3dh_LR.mp4

Matching within Levenshtein distance threshold didn't work out for this one.

Not sure if this of use to you or anyone else, but I thought it would be worth sharing. at the least it could be some inspiration for an actual built-in fuzzy matcher.

I might update the gist later.

mark-hahn commented 1 year ago

I'd like to try it but I'm not sure how to run it inside the docker version I'm running. I need to study up on docker and see if I can ssh into a docker image.

On Sun, Feb 5, 2023 at 5:01 PM DUZszyi @.***> wrote:

@mark-hahn https://github.com/mark-hahn I ran into something similar. In my download collection of years I have many files which I have changed the filenames of.

In some cases using a pattern like the CzechVR ones prefixed with cvr and CzechVR Fetish ones with cvrc. In other cases I prefixed a number representing a date. Some others I added tags in the filenames. So matching is a bit of a mess.

While being in a very pragmatic mood, I created this python script https://gist.github.com/DUZszyi/1f67e46d9ffda5ebec0e0db04d69b9a1

This opens the sqlite db and just pokes in the innards of xbvr, and it seems to work for me! There are three "matchers"

  1. based on substring. Either the target file starts with the stem of a known file or the stem of a known file is a substring in target. This matches the cases where I added stuff.
  2. special case for my cvr prefixing. might be caught by rule 1 too. I made rule 1 later :)
  3. generic Levenshtein distance. if less than 25% if the characters in target need insertions, deletions, or substitutions to get to a known file

Not sure if this of use to you or anyone else, but I thought it would be worth sharing. at the least it could be some inspiration for an actual built-in fuzzy matcher.

I might update the gist later.

— Reply to this email directly, view it on GitHub https://github.com/xbapps/xbvr/issues/1092#issuecomment-1418339609, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGGDPZKSVAYF6J34DWKYW3WWBEPVANCNFSM6AAAAAATSPKKB4 . You are receiving this because you were mentioned.Message ID: @.***>