Closed KelvinBull closed 4 years ago
I'm also wondering about it. Probably they just used tag names in .xml ('category','rating','realname'... etc.) as column names in .csv, but it would be nice to get a confirmation.
@KelvinBull I know it's an old issue, but here's the solution: to get the data in .csv format just run
python prepare_cls.py https://storage.googleapis.com/ulmfit/cls1
as suggested in this issue: https://github.com/n-waves/multifit/issues/32#issuecomment-464773677
Hey @KelvinBull @blazejdolicki sorry for the delay, the code to parse the original dataset into the csv files was not merged into master. It lives at https://github.com/n-waves/multifit/blob/datasets/prepare_cls.sh Let me know if you have any questions
Hi , I am reproducing your nice script but I don't know how to setup the format of data as input, namely the final . To clearly to get it, Could you give me an example to show? For example, how DATASET #cls-acl10-unprocessed# is actually .xml file , So it will be processed to be .csv file? what .csv file will be like finally? give me a snip simply.Thank you in advance.