hydrusnetwork / hydrus

A personal booru-style media tagger that can import files and tags from your hard drive and popular websites. Content can be shared with other users via user-run servers.
http://hydrusnetwork.github.io/hydrus/
Other
2.41k stars 160 forks source link

Problems with pixiv and tags #57

Closed KriPet closed 9 years ago

KriPet commented 9 years ago

It seems when you download a pixiv gallery by artist id, the tags are read from the "Illustration tags" for the artist, which is the most common tags across all the works by this artist, not the tags for each specific image.

This results in all the images getting the same tags.

I've been looking at the code, but you know how looking at someone else's code is. Since it seems you are the only contributor, I'd love to help, so I'll look a bit more, but I'll be busy for the next weeks while I finish my thesis.

Specifically, the problem seems to be with line 1282 in ClientDownloading.py, which should look at class_ = 'work-tags', not class_ = 'tagCloud'.

Sorta unrelated to this issue, but would you be interested in porting hydrus to python 3? I'm not sure if all dependencies are available for py3, but it might simplify some unicode stuff. I can look into it and it will serve as a way to familiarize myself with the codebase.

hydrusnetwork commented 9 years ago

Thank you for this. I'm not a pixiv user, so I didn't realise that was the wrong box! I will look at it this week.

I don't know much about python 3, so I don't know if it is something I want. As I understand, it works something like 10% slower, and they still haven't dropped the Global Intepreter Lock, when slow speed and single-core multi-threading are my two big annoyances with py2 already. Module support is also a concern. Do you know what the difference is between 2 and 3, beyond better unicode?

I'm a complete sperg who just gets stressed working in teams, so I work by myself. If you want to fork my code and do whatever you like, then please do, but I don't do pull requests. This github repo is really just a mirror of my offline coding process. Having said that, if you would still like to contribute, creating a simple python module I can import to fill a hole in the program would be great! You could write a comprehensive pixiv parser, perhaps, if that's your interest. Once your thesis is done, let me know if this is something you want to spend time on, and we can talk more about an API.

KriPet commented 9 years ago

There really isn't that much difference between 2 and 3, including speed. 3 has mostly caught up.

The reason I mentioned unicode was because I got some encoding errors. I'll make another issue report if I can nail it down.

For now, though, you can close this issue.