paul-tqh-nguyen / reuters_topic_labelling

Deep learning to automatically label Reuter's articles with the relevant topics.
1 stars 0 forks source link

Do first pass at pre-processing data #5

Closed paul-tqh-nguyen closed 4 years ago

paul-tqh-nguyen commented 4 years ago

We already have a first pass at our data processing utilities written (patched in via https://github.com/paul-tqh-nguyen/reuters_topic_labelling/issues/1).

Let's preprocess the data and get that committed so that we are able to track what design decisions might be made with the data.

paul-tqh-nguyen commented 4 years ago

https://github.com/paul-tqh-nguyen/reuters_topic_labelling/commit/1811859f139fada4e8b544ff56ac04cdaf9cb3a0 This commit gets all of the raw data into a tabular format (not very well normalized) and all of the topic data into a separate csv (all topics are columns to facilitate easy one-hot-vector creation).

This completes the first pass.