javinizer / Javinizer

(NSFW) Organize your local Japanese Adult Video (JAV) library
MIT License
591 stars 61 forks source link

DMM english scraper source #67

Closed zuko7177 closed 4 years ago

zuko7177 commented 4 years ago

Hi. When scraping on JavLibrary, I'm not getting results even though it exists. Here's example The movie does exists at http://www.javlibrary.com/en/vl_searchbyid.php?keyword=GVG-943 But Javinizer gives DEBUG: [2020-09-05T15:36:13][Get-JavlibraryDataObject] Function started DEBUG: [2020-09-05T15:36:13][Get-JavlibraryUrl] Function started DEBUG: [2020-09-05T15:36:13][Get-JavlibraryUrl] Performing [GET] on Uri [http://www.javlibrary.com/en/vl_searchbyid.php?keyword=GVG-943] with Session: [Microsoft.PowerShell.Commands.WebRequestSession] and UserAgent: [] VERBOSE: [2020-09-05T15:36:13][Get-JavlibraryUrl] Search [GVG-943] not matched on JAVLibrary DEBUG: [2020-09-05T15:36:13][Get-JavlibraryUrl] Function ended DEBUG: [2020-09-05T15:36:13][Get-JavlibraryDataObject] JAVLibrary data object: DEBUG: DEBUG: [2020-09-05T15:36:13][Get-JavlibraryDataObject] Function ended

Am I missing a setup step? Thanks.

jvlflame commented 4 years ago

What command are you running?

jvlflame commented 4 years ago

Also, if you're running v1.7.3, installation documentation is currently located under a different branch.

zuko7177 commented 4 years ago

I upgraded to 2.0.0 alpha 4 and JavLibrary is working now.

However, Dmm is not translating to EN. Is the translation working? (I understand it's alpha...) I updated jvSettings.json: "sort.metadata.nfo.translate": 1, "sort.metadata.nfo.translate.language": "en", and run Javinizer without parameters.

jvlflame commented 4 years ago

The translation is only intended for the description that is pulled from dmm. Otherwise if you wanted english language metadata you would probably pull from R18 or other scrapers. I'll make that more clear in the setting name and once I put together the settings documentation.

zuko7177 commented 4 years ago

Translation only for Dmm description makes sense. However, my description for Dmm is not getting translated. Is there a setting I can check besides these? "sort.metadata.nfo.translate": 1, "sort.metadata.nfo.translate.language": "en",

jvlflame commented 4 years ago

Ah, actually just realized there's currently a bug with it in 2.0.0-alpha4. It's currently fixed on my dev branch but I haven't pushed the next alpha release yet.

I'll probably put out the next alpha build in a few hours.

zuko7177 commented 4 years ago

Thank you so much for your time.

An alternative approach without using a translation service would be to scrape the EN version of DMM.

You would have to do a second request for the EN version (note the /en in URL): https://www.dmm.co.jp/en/mono/dvd/-/detail/=/cid=13gvg852/?i3_ref=search&i3_ord=1 Include the following cookies

Then go after the English description.
Perhaps a future feature to consider.

I recently submitted a PR for JAVMovieScraper to do this. Not sure if it will get approved :) https://github.com/DoctorD1501/JAVMovieScraper/pull/332

jvlflame commented 4 years ago

Oh, that's a good idea. Whenever I previously tried to access the english version of the DMM site, I got an error page of Not available in your region.

Is it new that they've opened it up to different regions? Regardless, I'l definitely look into adding that.

zuko7177 commented 4 years ago

Hmm... perhaps it is region locked. I'm US based and it works for me. I did not consider it might be locked for some regions.

jvlflame commented 4 years ago

I'm also US based. Last time I tried to access the english page was probably ~1+ yrs ago so who knows.

zuko7177 commented 4 years ago

I can confirm translation for Dmm plot is working as expected in alpha5. Thanks.

jvlflame commented 4 years ago

Converting this ticket to work on english dmm scraper source.

zuko7177 commented 4 years ago

I've noticed that the Google translation are slightly better. Perhaps make it an option to use either Google or DMM english?

jvlflame commented 4 years ago

I'm splitting them into two scrapers so you'll be able to select which one you use. -dmm -dmmja

It'll be a good alternative since the googletrans api has rate limits so if you're scraping thousands of movies at a time it'll start throwing errors.

zuko7177 commented 4 years ago

Ah so dmm will be english version of website. Dmmja with english translation enabled will be how its working today?

jvlflame commented 4 years ago

Dmmja with english translation enabled will be how its working today?

Yep.