hipster-philology / pandora

A Tagger-Lemmatizer for Natural Languages
MIT License
9 stars 4 forks source link

One-hot format for output data #14

Closed emanjavacas closed 7 years ago

emanjavacas commented 7 years ago

It seems that keras needs one-hot format for the targets in order to compute the loss, Pytorch however requires integer format. It makes sense to default to the pytorch format and have the keras implementation call to_categorical on it rather than having pytorch undoing the categorical (which means twice as much computation)

mikekestemont commented 7 years ago

I agree (also needs much less memory). Feel free to make the necessary changes in the pytorch branch.

Prof. Dr. Mike Kestemont | www.mike-kestemont.org | Twitter: @Mike_Kestemont | mike.kestemont@uantwerp.be | mike.kestemont@gmail.com | University of Antwerp | City Campus, Prinsstraat 13, room D. 118 I B-2000 Antwerp, Belgium | tel. +32 (0)3 265.42.54

Check out our documentary on Digital Humanities and Hildegard of Bingen: watch it in HD on Vimeo: https://vimeo.com/70881172

On Mon, May 8, 2017 at 11:40 AM, Enrique Manjavacas < notifications@github.com> wrote:

It seems that keras needs one-hot format for the targets in order to compute the loss, Pytorch however requires integer format. It makes sense to default to the pytorch format and have the keras implementation call to_categorical on it rather than having pytorch undoing the categorical (which means twice as much computation)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hipster-philology/pandora/issues/14, or mute the thread https://github.com/notifications/unsubscribe-auth/AELJL9yctsEryNG-zgcCTf7umBn-8dTXks5r3uMAgaJpZM4NTtls .

emanjavacas commented 7 years ago

Done as per 007db3f1ffd82dfe5690a3ccd5afa7cefb22da41. To keep it simple I just decided to add an argument to Preprocessor constructor categorical specifying whether the data should be turned into binary format or not.