topogram / weiboscope-data

Download, extract and index Weiboscope data
23 stars 7 forks source link

Data not available #1

Closed weilu closed 6 years ago

weilu commented 6 years ago

It appears that the link to the datazip no longer works. I looked around and found this: https://hub.hku.hk/cris/dataset/dataset107483 Would you like me to send PR to update the readme?

Also, I'm interested in using the full dataset for economic research purposes. Is it possible to add some instructions to the readme on how to request for access for scholars?

clemsos commented 6 years ago

Thanks. Please open a PR and I will merge it into the rep.

In case you need it, there is also a script to download all files - but the URL is also outdated https://github.com/topogram/weiboscope-data/blob/master/dl_raw_data.sh#L9

clemsos commented 6 years ago

My approach to use the corpus was to index it to Elastic Search then use queries to build sub-corpus that I could process more easily. Not sure if it will be helpful for your research ?

weilu commented 6 years ago

I like your approach. I was looking for the full data set including recent years. I thought this repo was published by the original data owner but I was mistaken. I'll ask the owner in HKU for data sharing.

clemsos commented 6 years ago

If I remember well @chainsawriot did was working on Weiboscope at HKU recently

weilu commented 6 years ago

Thanks @clemsos for the tip!

@chainsawriot let me know if you know anything about the data sharing policy around this dataset. I'm affiliated with NUS (National University of Singapore).

chainsawriot commented 6 years ago

Did. Yes, did. And I am no longer working in HKU anymore.

And yes, when I was still work there, I moved all the 2012 data to the HKU scholar hub. One can use that dataset by citing a paper (Refer to the README.txt). Please observe the Creative Commons Attribution NonCommercial license. If you need data beyond that, please contact Dr King-wa Fu at the HKU.