Open mark-hahn opened 1 year ago
because... SLRSexBabes is NOT the studio. The studio in your example is SexBabesVR.
I had mistyped SexBabesVR as SLRSexBabes in my original post. I closed this issue because I thought I'd let this drop until I did more homework. I've done that and I still have the problem. Here are details on the file/scene not matching ...
Filename: SLR_SexBabesVR_Pure Bliss_original_12851_LR_180.mp4
Studio name: SexBabesVR
I have enabled and scraped SexBabesVR multiple times. I've rescanned the files. I just cannot get any match. This is the HTML for the entry of the scene in the SLR website. I assume this is what you scrape ...
<article
class="c-grid-item c-grid-item--search u-block u-relative u-px--two u-relative u-float--l u-width--full js-c-grid-item"
data-qa="grid-item-search">
<div class="u-clear u-py--two desktop:u-py--four u-block">
<div class="u-width--full u-mr--two desktop:u-width--two-hnd-ei-five desktop:u-mr--four desktop:u-float--l">
<div class="u-absolute u-z--one u-pos--t u-pos--l u-mt--four u-ml--four desktop:u-mt--six js-c-grid-badges"
data-qa="grid-item-search-badge">
</div>
<div class="c-grid-ratio u-ratio u-ratio--grid u-ratio--top u-bgc--bl" data-qa="grid-item-search-thumbs">
<a href="/scenes/pure-bliss-12851" class="u-block u-absolute u-stretch" data-qa="grid-item-search-thumb-link">
<img class="u-ratio-item u-transition--video ls-is-cached lazyloaded"
src=""
data-srcset="https://cdn-vr.sexlikereal.com/images/12851/vr-porn-Pure-Bliss-cover-app.jpg"
data-videosrc="https://cdn-vr.sexlikereal.com/preview/14x1/12851_300p.mp4" data-videotype="video/mp4"
alt="<u>Pure</u> Bliss; Lesbian blondes teen slim petite curvy real tits shaved pussy - SexBabesVR | SexLikeReal"
data-qa="grid-item-search-thumb-img"
srcset="https://cdn-vr.sexlikereal.com/images/12851/vr-porn-Pure-Bliss-cover-app.jpg"><video loop=""
playsinline="" muted=""
class="u-absolute u-stretch u-block u-c u-transition--video u-z--zero u-height--full u-width--full u-scale--video">
<source src="https://cdn-vr.sexlikereal.com/preview/14x1/12851_300p.mp4" type="video/mp4"></video>
</a>
<span
class="c-grid-video--availible u-block u-absolute u-pos--l u-pos--t u-disabled u-z--neg u-visibility--hidden"></span><span
class="c-grid-video--progress u-absolute u-pos--l u-width--zero"></span></div>
</div>
<div class="u-mt--two desktop:u-float--l desktop:u-mt--zero">
<div
class="u-none u-nowrap u-ellipsis u-overflow--hidden u-mb--one desktop:u-block desktop:u-max-width--six-hundred desktop:u-nowrap-normal"
data-qa="grid-item-search-title"><a href="/scenes/pure-bliss-12851"
class="u-fs--tw u-fw--bold u-wh hover:u-lw u-transition--base u-lh--base u-pr--one desktop:u-fs--ei"><u>Pure</u>
Bliss; Lesbian blondes teen slim petite curvy real tits shaved pussy - SexBabesVR | SexLikeReal</a></div>
<div class="c-grid-item-footer u-mb--two desktop:u-mb--four" data-qa="grid-item-search-footer">
<div class="u-relative">
<div class="u-mt--half ">
<div class="u-flex u-flex-justify--between u-align-i--center u-relative u-z--one">
<a href="/scenes/pure-bliss-12851"
class="c-grid-item-footer-title u-ellipsis u-nowrap u-fw--bold u-lh--base u-fs--fo u-wh u-block hover:u-lw u-transition--base u-overflow--hidden desktop:u-none"
data-qa="grid-item-search-link-title"><u>Pure</u> Bliss; Lesbian blondes teen slim petite curvy real
tits shaved pussy - SexBabesVR | SexLikeReal</a>
</div>
<div class="u-mb--one u-mt--one u-fw--medium u-fs--fo u-lh--base u-nowrap u-ellipsis">
<a href="/studios/sexbabesvr" class="u-dw hover:u-wh u-transition--base u-inline-block u-align-y--m"
data-qa="grid-item-search-link-studio">SexBabesVR</a>
<div class="u-inline-block u-align-y--m u-dw"> • 3 years ago</div>
</div>
<div class="u-fw--medium u-height--twn-si u-overflow--hidden">
<div class="u-inline-block u-align-y--m u-lh--one u-dw u-fs--fo">
<button type="button" class="c-like o-btn c-like--text-btn o-btn--small2 u-px--zero js-c-like3 "
data-qa="like-btn" data-entity-type="favorite" data-entity-id="12851" data-project-id="1"
data-message-added="'Pure Bliss' was added to your favorites"
data-message-removed="'Pure Bliss' was removed from your favorites"
data-icon-name-default="heart-outlined" data-icon-name-active="heart">
<!-- icon heart-outlined -->
<span class="o-icon c-like-icon o-icon--small u-mr--one js-c-like-icon">
<svg class="o-icon o-icon-svg" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
<use href="#icon-heart-outlined"></use>
</svg>
</span>
<span class="c-like-text u-align-y--m js-c-like-counter js-c-like-counter--k" data-text="1.4k"></span>
</button>
<button
class="c-playlist-watch-later-trigger c-playlist-watch-later-trigger--btn o-btn o-btn--small2 o-btn--text o-btn--squared js-m-tooltips js-c-playlist-save-to-watch-later"
data-qa="grid-item-save-to-watch-later" data-project-id="1" data-scene-id="12851">
<!-- icon bookmark-outlined -->
<span class="o-icon o-icon--small c-playlist-watch-later-trigger-icon">
<svg class="o-icon o-icon-svg" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
<use href="#icon-bookmark-outlined"></use>
</svg>
</span>
<!-- icon bookmark -->
<span class="o-icon o-icon--small c-playlist-watch-later-trigger-icon--active">
<svg class="o-icon o-icon-svg" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
<use href="#icon-bookmark"></use>
</svg>
</span>
<div
class="c-playlist-watch-later-trigger-text m-tooltips-content u-radius--two u-bgc--wh u-fs--tw u-fw--medium u-py--one u-px--two u-dg u-nowrap u-transform-none u-z--three"
data-popper-placement="top" data-text-save="Save to watch later"
data-text-remove="Remove from watch later">
<div class="m-tooltips-arrow" data-popper-arrow="">
<div class="u-bgc--wh"></div>
</div>
</div>
</button> </div>
</div>
</div>
</div>
</div>
</div>
</div>
</article>
The scene shows up as gray in heresphere but plays when clicked. This makes sense.
SexBabes is not scrapped from SLR, it is scrapped from SexbabesVR, so SexBabesVR Scenes downloaded from SLR will have to be matched manually. If you look in the Scene Details/Edit Scene (pencil icon) there is a tab for the Filenames that will match automatically.
Ah, that makes sense but it is a real pita. Of the first 80 scenes downloaded 20 didn't match. Aggregators like SLR have a zillion scenes from other sites and manually matching all of them is a lot of work. This is a flaw in the system. I have two possible solutions that I would like to be considered feature requests.
1) Scan aggregators like SLR in addition to the original sites. 2) Use intelligent matching that looks for close matches between file names and scene names. Logic for close matching is readily available.
I hope I'm not coming off as a jerk. I make the suggestions as friendly advice.
BTW, I'm a retired programmer. Do you think pull requests in this area would be welcome? I have already used close-matching code in an app I wrote to match downloaded videos for movies and TV to file names.
I'm a retired programmer as well (unless something interesting comes along). As of 6 months ago, I had never used xbvr and had never developed using of the technology platforms it is built on. So, I think absolutely I think PRs would be welcome, I think I'm up to about 50 now across all sorts of area. New people also bring new ideas and perspectives, and there is certainly a variety of how people want to use the package.
I have done something along the lines of matching for myself. I match all scenes from SLR and VRPorn to xbvr scenes that are scraped from their original site. I would say my automated matching is about 98-99%. One of my first PRs was to change the Search to allow you to target specific searches for specific words to a specific field, i.e., site, cast, title or synopsis, which is the basis of my matching, I essentially build a huge search query with words from the other source, targeting those fields in xbvr, e.g., instead of just searching "Some scene title-An Actor" I search"title:Some title:scene title:title cast:an cast:actor". However, even with what I think is a very high match success rate, I'm not sure I'll every submit it. That 1 or 2% would translate to 1 or 2 posts a week on Discord asking why it didn't match. And that's just noise I don't want really want to deal with.
I don't think you would every get to 100%. There just so many exceptions e.g.
Don't get me wrong, it's a good idea, my main concern would be people realizing and accepting it's not 100% and not creating an extra support requirement. Ideally, a crowd source approach to correcting mismatches would work, but XBVR is all hosted locally with no remote/web based service where you could facilitate that. I would probably have tackled a solution, if I had time to also build a ui to allow people to sort their own mismatches, but I suspect that's as much work as the actual matching process.
I don't mean to seem like I'm rubbishing the idea or even saying don't do it, matching scenes between sites has huge benefits for other things as well, after all I have bult something for myself. Just highlighting some other considerations to think about.
Wow. I had no idea things were such a mess. My pollyanna idea was that an aggregator scene was just a pointer to the original studio's scene. I had no idea it was manually copied. That eliminates my idea 2 of smart matching.
That leaves idea 1 which is to just scrape the aggregator like SLR and leave the original studio out of the equation. I just looked at the HTML I posted above and I noticed that the file name is not in there. The link is to /scenes/pure-bliss-12851
not SLR_SexBabesVR_Pure Bliss_original_12851_LR_180
. But one could match by the number 12851 which would be exact matching. That would work, but only in SLR. Separate algorithms would need to be written for each aggragator site.
Idea 1 would be easier and more reliable than 2. But would the benefits outweigh the effort? I might code it for SLR and CzechVR because those are the only two I care about. Some other retired programmer could do the others :-) If a pull request is ignored it wouldn't be a big deal.
My 2 cents, if you have the time, give it a try, if it all works out you have a PR that benefits others. If not, you may still have something more specific that meets your own personal needs, but you still learn about how to make changes to xbvr which could be applied to other PR's.
OK, things are worse than I thought. When I try to match, which I haven't tried before, there is no match. I rescanned the files and rescraped. I tested two files. Any idea what could cause this?
I'm not sure what the problem is? Are you just clicking the "Match" button and leaving the actual search box that comes up as is? You might want to change the search query.
I took a SexBabesVR file with SexBabesVR's filename, SexBabesVR_Purely Temptatious_2700_493_LR_180.mp4
and renamed it to what I imagine SLR would call it, SLR_SexBabesVR_Purely Temptatious_2700p_29264_LR_180.mp4
based on what you've mentioned.
It was the first option. Repeating the match search with the search entry as just the scene title increases the match "score"
I took a SexBabesVR file with SexBabesVR's filename, SexBabesVR_Purely Temptatious_2700_493_LR_180.mp4 and renamed it to what I imagine SLR would call it, SLR_SexBabesVR_Purely Temptatious_2700p_29264_LR_180.mp4
How did you know to add 29264? Did you somehow cheat? :-)
With the full file name I got five pages with no match. When I reduced the search to just "bliss" I got 2 pages of results down from 5, and neither page had bliss in the title. This is backwards to most searches which show more results as you reduce the search term. Eventually reducing it to empty should show all scenes. When I put in "pure bliss" I got the original 5 pages. This is baffling to me as it doesn't make sense. Could search be broken?
I guess I can watch scenes just based on the filenames with no other information. Bummer.
I just had a thought (they are rare nowadays). I originally scanned the files and then tried to match. I realized later that I had never scraped SexBabesVR. After scraping it I rescanned the files. The match was still not found. Is it possible that scraping after file scanning creates a permanent mismatch? In other words the filenames are only matched to earlier scrapes?
The default download file was called, SLR_SexBabesVR_Purely Temptatious_2700p_29264_LR_180.mp4.
If I need to manually match SLR files, as you would with Sexbabesvr, you click match. At this point don't be hung up on the name of the file, the search term when you click Match is pre-populated with the filename but that just a convience thing to save some typing. Other than that, the filename is no longer relavant. Change the search term to find the scene you want, I usually get rid of everything from the filename except site and title, i.e. I would search "SexBabesVR Purely Temptatious", that usually enough to match, if not I may add the Actors name or prefix words from the title with title:. When you hit Match it's not about making the filename match, at this point auto matching based on filename has already failed, it's now just about finding the scene you want. Assuming you find it, as in the list in vt's example, you click Assign, xbvr will link that scene to your file, even though it wasn't in it's list of valid filename. It will add the filename to the list of valid filenames for that scene in your database.
If you have the scene scrapped and it's not coming up in the search results, then you may have a search index problem and should rebuild them in Options/Cache/Search Index/Reset
if not I may add the Actors name
How can I know that if I've never seen the scene?
a search index problem and should rebuild them in Options/Cache/Search Index/Reset
I just clicked all buttons in that tab, rescanned, and rescraped. Now the search is totally broken. No results show up no matter what I put in the search box. Oh well ...
How can I know that if I've never seen the scene?
Do you mean how can you know the Actor, SLR list them on the scene where you downloaded it.
Now the search is totally broken. No results show up no matter what I put in the search box
It's probably still rebuilding the search indexes. They get locked when they are updating. A large system can take a couple of hours
I waited 3 hrs with no luck. Now I've restarted the docker container and scanned and scraped again. Still no joy in mudville. I'm guessing the next step would be to destroy the container, reinstall it, rescan, and rescrape. I've spent enought time on this for now and will live with just filenames. I don't know what I could have done wrong to cause this.
How did you know to add 29264? Did you somehow cheat? :-)
No lol, I just found the page for the scene on SLR, used the SLR/SBVR filename you previously mentioned, and the number I saw in the SLR URL: https://www.sexlikereal.com/scenes/purely-temptatious-29264
My file is actually from POVR, so I'd already renamed it to match SexBabesVR's own filenames to just have it match automatically as a test. It was originally named sex-babes-vr-purely-temptatious-katy-rose-180_180x180_3dh_LR.mp4
but since the actress name is in there, the scene itself was still the first result on the match screen.
@mark-hahn , I tried a test with the scene and filename you're having trouble with. The video itself was "filler", ignore the bizarre resolution, file size, etc., I just grabbed the nearest/smallest .mp4 file I had to use as a stand in since I don't have the scene itself downloaded.
The actual SexBabesVR filename matched on its own, as expected.
Where did you find what that should be called?
From within XBVR itself. "Vika P" doesn't have too many scenes anyways, so it wasn't hard to find, but do note that SBVR actually has her credited as "Aislin". I already had an "aka" in here though.
I removed some of the existing filenames (there were 6-7 more) just to make sure it'd show up in the screenshot, but if a scene is still giving you a hard time, you could always just click that blue "Add item+" button, and paste in whatever filename SLR is using, and then "Save Scene Details". It's a bit more roundabout than using the "Match" function, but the next time you scan the drive the file is on, it will just automatically match. That's all clicking "Match" technically does - but in one step. It adds a new filename to that list, and then associates the file with the scene.
As for my test with the "SLR filename"
The correct video is there in the list, it's the 2nd result.
Forgive me if you've already mentioned it, but you do already have the SexBabesVR scraper enabled, right?
Screenshot doctored to avoid making it 6000 pixels tall.
I have enabled and scraped SexBabesVR multiple times.
Oh, indeed you have. Well, does the scene show up on the "Any" or "Not Downloaded" tabs if you filter by actress "Aislin" (or "Tiffany Tatum") and studio "SexBabesVR"?
Last but not least, although I think @toshski already mentioned it, you can always try resetting the Search Index
It will take some time to re-build it, and I'm probably mistaken, but I think it only triggers after you run a scraper (any scraper).
PS. If the scene is missing, then it's not scraping properly, but either way, importing this content bundle will plop the scene in your library, with the SLR filename already added to it, and the AKA for Aislin.
xbvr-content-bundle-mark-hahn.json.txt
Rename it, remove the .txt
and you can import it like this:
Thanks. I will investigate the actions you suggest if/when search is working again.
@mark-hahn I ran into something similar. In my download collection of years I have many files which I have changed the filenames of.
In some cases using a pattern like the CzechVR ones prefixed with cvr
and CzechVR Fetish ones with cvrf
. In other cases I prefixed a number representing a date. Some others I added tags in the filenames. So matching is a bit of a mess.
While being in a very pragmatic mood, I created this python script https://gist.github.com/DUZszyi/1f67e46d9ffda5ebec0e0db04d69b9a1
This opens the sqlite db and just pokes in the innards of xbvr, and it seems to work for me! There are three "matchers"
Example of a match:
INFO:root:updating file 9171 to point to scene 27130 warm-pie-for-daddy-files-35286-oculus5k_180_LR.mp4 ~= warm-pie-for-daddy-files-oculus5k_180_LR.mp4
INFO:root:updating scene 27130 adding file to known files: warm-pie-for-daddy-files-35286-oculus5k_180_LR.mp4
warm-pie-for-daddy-files-35286-oculus5k_180_LR.mp4
is the file I have downloaded from virtualtaboo. It got matched to scene with pk 27130 because of Levenshtein distance to warm-pie-for-daddy-files-oculus5k_180_LR.mp4
(one of the files already known to that scene) is within threshold. Apparently that 35286 ID is no longer there or the scraper at least didn't get it like that (scraped today).
Example of a false positive:
INFO:root:updating file 9200 to point to scene 9810 BabeVR_Into_the_Azul_5k_180_180x180_3dh_LR.mp4 ~= babevr_under_the_tree_5k_180_180x180_3dh_LR.mp4
INFO:root:updating scene 9810 adding file to known files: BabeVR_Into_the_Azul_5k_180_180x180_3dh_LR.mp4
Matching within Levenshtein distance threshold didn't work out for this one.
Not sure if this of use to you or anyone else, but I thought it would be worth sharing. at the least it could be some inspiration for an actual built-in fuzzy matcher.
I might update the gist later.
I'd like to try it but I'm not sure how to run it inside the docker version I'm running. I need to study up on docker and see if I can ssh into a docker image.
On Sun, Feb 5, 2023 at 5:01 PM DUZszyi @.***> wrote:
@mark-hahn https://github.com/mark-hahn I ran into something similar. In my download collection of years I have many files which I have changed the filenames of.
In some cases using a pattern like the CzechVR ones prefixed with cvr and CzechVR Fetish ones with cvrc. In other cases I prefixed a number representing a date. Some others I added tags in the filenames. So matching is a bit of a mess.
While being in a very pragmatic mood, I created this python script https://gist.github.com/DUZszyi/1f67e46d9ffda5ebec0e0db04d69b9a1
This opens the sqlite db and just pokes in the innards of xbvr, and it seems to work for me! There are three "matchers"
- based on substring. Either the target file starts with the stem of a known file or the stem of a known file is a substring in target. This matches the cases where I added stuff.
- special case for my cvr prefixing. might be caught by rule 1 too. I made rule 1 later :)
- generic Levenshtein distance. if less than 25% if the characters in target need insertions, deletions, or substitutions to get to a known file
Not sure if this of use to you or anyone else, but I thought it would be worth sharing. at the least it could be some inspiration for an actual built-in fuzzy matcher.
I might update the gist later.
— Reply to this email directly, view it on GitHub https://github.com/xbapps/xbvr/issues/1092#issuecomment-1418339609, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGGDPZKSVAYF6J34DWKYW3WWBEPVANCNFSM6AAAAAATSPKKB4 . You are receiving this because you were mentioned.Message ID: @.***>
<For some bizzare reason this original post was erased. It was about a problem matching files to scenes. The problem is detailed again in the 3rd post below>