Closed weilu closed 6 years ago
Thanks. Please open a PR and I will merge it into the rep.
In case you need it, there is also a script to download all files - but the URL is also outdated https://github.com/topogram/weiboscope-data/blob/master/dl_raw_data.sh#L9
My approach to use the corpus was to index it to Elastic Search then use queries to build sub-corpus that I could process more easily. Not sure if it will be helpful for your research ?
I like your approach. I was looking for the full data set including recent years. I thought this repo was published by the original data owner but I was mistaken. I'll ask the owner in HKU for data sharing.
If I remember well @chainsawriot did was working on Weiboscope at HKU recently
Thanks @clemsos for the tip!
@chainsawriot let me know if you know anything about the data sharing policy around this dataset. I'm affiliated with NUS (National University of Singapore).
Did. Yes, did. And I am no longer working in HKU anymore.
And yes, when I was still work there, I moved all the 2012 data to the HKU scholar hub. One can use that dataset by citing a paper (Refer to the README.txt). Please observe the Creative Commons Attribution NonCommercial license. If you need data beyond that, please contact Dr King-wa Fu at the HKU.
It appears that the link to the datazip no longer works. I looked around and found this: https://hub.hku.hk/cris/dataset/dataset107483 Would you like me to send PR to update the readme?
Also, I'm interested in using the full dataset for economic research purposes. Is it possible to add some instructions to the readme on how to request for access for scholars?