javinizer / Javinizer

(NSFW) Organize your local Japanese Adult Video (JAV) library
MIT License
599 stars 62 forks source link

Scraping on DMM fails when videos are removed from r18 #261

Closed seeyabye closed 3 years ago

seeyabye commented 3 years ago

Expected Behavior

A valid URL should be returned from dmmja even though video URLs are missing from r18.

Current Behavior

The current behavior of will invalidate dmmja

Steps to Reproduce (for bugs)

Additional Notes

The current implementation of Get-DmmUrl relies on searching r18 and constructing the dmm equivalent of the video URL. In my opinion isn't a very good solution as now both dmm and r18 are coupled. What happens if r18 goes down? Dmm will fail to be searched.

In this particular situation, ABW-047 was removed from r18, but it still exists on dmmja.

I think a fallback search on dmmja should still happen.

jvlflame commented 3 years ago

I added a fallback for a native dmm url match, though it's mostly using a hardcoded URL and the movie contentId. I tried to test with your example ABW-047 but I wasn't able to get any matches on either R18 or DMM. Do you have another movie to test on?

seeyabye commented 3 years ago

the exact code name on dmm is 118abw047

I think you can try any any movies under the maker prestige (プレステージ) as I noticed they are no longer listed as official makers on dmm, but their movies are still available for some reason.

Here are a couple of other movies: ABW-049 (118abw049), SGA-148 (118sga148)

I think hardcoding the links on dmm would not be the best idea, but we should fallback to searching on dmm whenever r18 URLs are not returned.

jvlflame commented 3 years ago

Thanks for those CID values. Did some testing on the URLs I have used traditionally for both the EN/JA versions of dmm:

En = "https://www.dmm.co.jp/en/mono/dvd/-/detail/=/cid=118sga148"
Ja = "https://www.dmm.co.jp/digital/videoa/-/detail/=/cid=118sga148"

From what I can see, the Ja version (with /digital/videoa/...) is returning not found, while the En version works (with /mono/dvd/...).

Unless I'm misunderstanding how their search engine works, I'm unable to get any matches for the contentId values using the DVD search. This is in contrast to the digital video search which works as expected, though it misses the point as the Prestige videos don't show up in the digital section. Were you able to use their search engine to find Prestige videos?

The DVD search url that I'm using is:

https://www.dmm.co.jp/mono/dvd/-/search/=/searchstr=[...]
seeyabye commented 3 years ago

Thanks for checking it out.

Unfortunately, I'm still not sure how to access dmm.co.jp/en so I can't test this out (even with a VPN, I'm still getting redirected back to the Japanese version).

On the Japanese version of DMM, the URL that you just provided,

https://www.dmm.co.jp/mono/dvd/-/search/=/searchstr=[...]

works perfectly fine and I could get a return result to a DVD prestige page.

Unless I'm misunderstanding how their search engine works, I'm unable to get any matches for the contentId values using the DVD search. This is in contrast to the digital video search which works as expected, though it misses the point as the Prestige videos don't show up in the digital section. Were you able to use their search engine to find Prestige videos?

Does that mean that CIDs are not shown to you when you browse DVD type work? I could see CID on both digital and DVD pages.

jvlflame commented 3 years ago

Here's my result from searching すべて: image

And my result from DVD: image

Trying ABW in すべて and DVD yields no results either: image

seeyabye commented 3 years ago

Thanks for sharing the screenshots. This is really interesting. I'm speculating that VPN may not work to its 100% efficiency and there's some way DMM could figure out our exact locations. I'm not sure, but in Japan, I'm getting results without problems on both すべて and DVD. Here're the screenshot:

すべて Screen Shot 2021-05-24 at 13 15 16

DVD Screen Shot 2021-05-24 at 13 16 50

Edit: I've tested with a VPN now and it seems like in US, prestige series are not searchable at all, but here in Japan, results are returned.

jvlflame commented 3 years ago

Edit: I've tested with a VPN now and it seems like in US, prestige series are not searchable at all, but here in Japan, results are returned.

Just tested on my VPN as well. Same results.

That does make sense since they removed the entries on R18 which is the international version. I'll update the search method and test with the VPN.

jvlflame commented 3 years ago

@seeyabye If you have a chance, could you test the matching from the changes on the dev branch? My initial tests passed but since this change will mostly affect your use case since you can natively see Prestige videos I'd like some additional confirmation.

seeyabye commented 3 years ago

@jvlflame thanks for the fallback. I've tested it and it seems good, but there are some edge cases that I've found which failed.

I've made a PR to fix this issue: https://github.com/jvlflame/Javinizer/pull/262