Closed seeyabye closed 3 years ago
Thanks I was going to look at this soon too pertaining to #293. Can you also check the description field?
Seems like some extra html is getting tagged on now as well.
I think adding a simple html regex remover might work, but I haven't tested it thoroughly.
That's one way, but I think the description's regex doesn't work for this particular series. It's currently only extracting one line from the entire description shown on DMM's page.
Ah woops, didn't realize it was cutting off the entire description.
Yeah I took a look at a few more dmm pages and I'm not sure if there's a way to properly extract descriptions that span multiple lines versus descriptions that are just a bunch of ads.
I don't have a good solution for this section right now either. The best way would be to have an HTML parser so that we can extract out the div for sure. If I could come up with a solution, It will probably be done in a separate pull request.
Fixes scraping issue for maker, label and series on DMM Ja.