Closed msbrown closed 5 years ago
great, thanks!
Hey! This is wonderful, but I'm not sure we can include it due to copyright? We discussed this a bit hin #9. Or am I mistaken and Harry Potter is fair game? Is it on Project Gutenberg?
ups! that's true! We can still keep the model but not the source text. no?
I think this is a grey area and a super interesting question! Can we publish a model trained on text not in the public domain? I think for the ml5 project we probably should err on the conservative side and not include any models trained on text we don't have the rights to? This isn't a legal opinion by any means of course and doesn't preclude independent projects making use of other models!
Ok, sounds good. We can make a cleanup with https://github.com/ml5js/ml5-data-and-models/issues/30 and only keep models that where trained on text we have the rights
All of the books are listed on the archive.org in their opensource collection (in case that helps): https://archive.org/details/welcometohogwarts & https://archive.org/details/opensource They were listed with a no copyright tag: https://creativecommons.org/publicdomain/zero/1.0/
Another consideration on copyright: Should the policy be across the board? If so, then that may have implications for images used in Styletransfer (and/or sourcing).
Yes, I was thinking this as well. I am not sure how to best approach this but yes I believe that any datasets we use for training (images, text, etc.) should have an appropriate license. This likely affects the pix2pix models in particular, let's think about this and discuss at our next meeting?
Sounds good!
Adding data with text for all the Harry Potter books (cleaned up from Project Guttenberg) and JKRowling model trained on Harry Potter text. LSTM model was trained with the following parameters --rnn_size 512 --num_layers 2 --seq_length 128 --batch_size 64 --dropout 0.25