mozilla / common-voice-global-sprint

Repository for the microsite for the Common Voice Global Sprint participation
MIT License
3 stars 3 forks source link

Make it very clear that CommonCrawl/OpenSubtitles contain copyrighted material #13

Open ftyers opened 6 years ago

ftyers commented 6 years ago

At the moment in the link to contributing it suggests CommonCrawl and OpenSubtitles as good places to find text, while saying that Wikimedia sites can't be used.

This is a bit deceptive as Wikimedia does copyright control to make sure that everything is legally usable by people, where CommonCrawl and OpenSubtitles do not. One solution would be to add something like:

"WARNING: Both CommonCrawl and OpenSubtitles contain substantial amounts of copyrighted material in addition to some material that may be in the public domain, be sure to check thoroughly that the sentence is in the public domain before submitting it. Note that if you cannot verify that the sentence is in the public domain (e.g. by showing that it was produced by an author who died over 70 years ago) then you should not add it."

Djfe commented 6 years ago

I wouldn't recommend opensubtitles at all. Aren't subtitles copyrighted for most current movies, since they are still related to what the people in the movie say? (they are part of the copyrighted, not public script) https://thenextweb.com/insights/2017/09/15/fan-made-subittles-copyright/