javinizer / Javinizer

(NSFW) Organize your local Japanese Adult Video (JAV) library
MIT License
599 stars 62 forks source link

Fixed label, maker and series for dmmJa #294

Closed seeyabye closed 3 years ago

seeyabye commented 3 years ago

Fixes scraping issue for maker, label and series on DMM Ja.

jvlflame commented 3 years ago

Thanks I was going to look at this soon too pertaining to #293. Can you also check the description field?

Seems like some extra html is getting tagged on now as well.

image

jvlflame commented 3 years ago

I think adding a simple html regex remover might work, but I haven't tested it thoroughly.

image

image

seeyabye commented 3 years ago

That's one way, but I think the description's regex doesn't work for this particular series. It's currently only extracting one line from the entire description shown on DMM's page.

jvlflame commented 3 years ago

Ah woops, didn't realize it was cutting off the entire description.

jvlflame commented 3 years ago

Yeah I took a look at a few more dmm pages and I'm not sure if there's a way to properly extract descriptions that span multiple lines versus descriptions that are just a bunch of ads.

seeyabye commented 3 years ago

I don't have a good solution for this section right now either. The best way would be to have an HTML parser so that we can extract out the div for sure. If I could come up with a solution, It will probably be done in a separate pull request.