Closed paul-tqh-nguyen closed 4 years ago
https://github.com/paul-tqh-nguyen/reuters_topic_labelling/commit/1811859f139fada4e8b544ff56ac04cdaf9cb3a0 This commit gets all of the raw data into a tabular format (not very well normalized) and all of the topic data into a separate csv (all topics are columns to facilitate easy one-hot-vector creation).
This completes the first pass.
We already have a first pass at our data processing utilities written (patched in via https://github.com/paul-tqh-nguyen/reuters_topic_labelling/issues/1).
Let's preprocess the data and get that committed so that we are able to track what design decisions might be made with the data.