kent-lee / deviantart-scraper

personal project for downloading artworks from DeviantArt
66 stars 14 forks source link

Challenge: Download user favorites (and create recommendations) #2

Open DonaldTsang opened 5 years ago

DonaldTsang commented 5 years ago
  1. Given a user account, or a user's collection, download all of the images within the favorites/collection
    • For user favorites https://www.deviantart.com/<user>/favourites/
    • For user collections https://www.deviantart.com/<user>/favourites/<collectionID>
    • create an API that output a JSON list of favorites URLs, along with artist name/ID based on these two types, along with a marker that states which favorite/collection it is from
  2. All images downloaded should have retained its metadata, including:
    • Name of the piece and tags (ease of search)
    • Description (good for finding collaborations and descriptions)
    • Artist name/ID (will be important for point no.3)
    • The metadata as a JSON file should be stored in pairs along side the image
    • The artist/name ID should be easily accessible through the filename or some other means
  3. Given a user account, or a set of multiple user accounts, recommend a list of artists based on one of these criteria:
    • quantity based (the user(s) favs/collection contains more than X amount of art from artist Y)
    • percentage based (the user(s) favs/collections contains more than X% of art from artist Y)
    • Other strategy that is applicable
    • create an API that outputs a JSON list of recommended artists based on the data, along with a marker listing the input of user or list of users
  4. Given a user account, or a set of multiple user accounts, find as many artist as possible, and draw a network diagram of user favorites
    • there will be two variables, X and N (X in point no.3, N for depth)
    • the system will find all artists that is N artists "away" from the main search group
    • e.g. if A like artist B, B like artist C, then B is 1-away, C is 2-away
    • (extra credit) Use Matplotlib + NetworkX to achieve the result
    • (extra credit) cluster different artists in the network into sub-groups
    • create an API that outputs a list of all nodes (users/artists/collections containing list of all art pieces) and connections (favorites, which has an artist, a faver, and a list of art pieces that is being liked)
kent-lee commented 5 years ago

@DonaldTsang

First of all, thank you for suggesting these interesting challenges! I have looked into it already and implemented some codes for (1) and (2). However, I found that my program was producing inconsistent results. For example, the regex for certain elements (e.g. the validation token csrf) cannot be found at times.

Upon further investigation, I realized that the problems were caused by the new UI changes that has been rolling out recently. For more details, please have a look at the readme of this repository.

Due to the above issue, I have decided to pause any development of this project for now until the new UI is stable and released publicly. At the meantime, I will look into pixiv_scraper. Thank you again for these challenges and I will certainly return to them when the new UI is available.

DonaldTsang commented 5 years ago

@Kent-Lee thanks for understanding the issue, and I respect that you paused dA until the UI is "safe". Do you think such a feature would be useful for art discovery? And do you think APIs are useful?

Also, on a side note: Do you have a Discord account? If not, Matrix/IRC handle?

DonaldTsang commented 5 years ago

@Kent-Lee please check the DMs in Discord, there are some hints as to how the UI changes can be fixed.

kent-lee commented 5 years ago

@DonaldTsang sorry for the long wait; I have been busy with other stuff lately. Anyways, the program is now updated for the new UI, so it should work like before. I will look into your suggestions soon (if there is no other major things happening in real life that is). Thank you.

DonaldTsang commented 5 years ago

Don't worry, hope that you are doing well in school/work. The programs is starting to take form. Also be aware that the Pixiv scraper's results should be similar to dA for ease of cross-checking.

DonaldTsang commented 4 years ago

And even better, is that now you can scrape people's "watching" list from the new "Eclipse"/dark mode! https://www.deviantart.com/<username>/about#watching e.g. https://www.deviantart.com/tonibabelony/about#watching and repeatedly clicking the <button> within <div> inside <div> inside <div id="watching">, then get all the <a> inside <span> inside <div> and <div> and <div> before inside the same <div id="watching"> From that we can pull some tricks from "Twitter Following Graphs" (who follows who on Twitter) and rank people based on what they liked (link prediction and community detection).