We can try scraping data from AppStore using itunes-app-scraper. No authentication required. Maybe better than official API, but there are some considerations about how much the data is up to date. Check related chatgpt dialogue
Probably we will get better data using APIs but it might have some limitations,right ? And we will need large data since we have lots of song mood descriptor classes. To prevent bias, it's likely that we need so many artist-song (maybe playlist) pairs/triples data to increase variability wrt genres (and some other predictors probably, we should do a further analysis on that).
I have completed cleaning on small-scale data and the number of unique artist-song pairs is around 500k now. Here is a question: What should be the proportion between our usage of newer and older data? The released_year's of songs probably can determine patterns on the data. Actually they define global trends.
Under these considerations, maybe we should collect more data beforehand to prevent target class imbalances or bias in the data in general.
There are some large SQL dumps and tar.gz files on the net. I think about processing these. What do you think?