Open thewilkybarkid opened 5 years ago
Is it ok to do this given the history of SCIGen?
This was mentioned at the planning team meeting and there does not seem to be a problem using this tool. Best check with Mark Patterson
Great that some papers have even been accepted. Do these need to generated on a regular basis because you are adding new elements? Is there a danger someone mistakes them for real papers?
Do these need to generated on a regular basis because you are adding new elements?
Probably, so it'd be good to automate where possible. (The Pattern Library would be hard-coded, which is fine, but the unstable/demo environment could generate it's content on demand/on deploy.)
Is there a danger someone mistakes them for real papers?
We might want to put a status indicator bar on demo environments. https://demo--journal.elifesciences.org/ doesn't state anywhere what it is...
Regarding the license. Dada Engine seems to be permissive in the style of the MIT license as it specifically states that the license does not have to be used with modifications, i.e. it is not copyleft:
Modifications to this software may be copyrighted by their authors and need not follow the licensing terms described here, provided that the new terms are clearly indicated on the first page of each file where they apply.
I would suggest this is okay to use with an MIT license repository and redistribute a modified version there too.
Using GPL inside MIT projects is trickier but it depends whether we're redistributing the code i.e. could we extend it by including it as a library at runtime. Here's the best information I've found on the topic: https://opensource.stackexchange.com/questions/6062/using-gpl-library-with-mit-licensed-code
We’d have to convert SCIGen significantly (convert https://github.com/strib/scigen/blob/master/scirules.in to be like https://github.com/orenmazor/Dada-Engine/blob/master/scripts/pomo.pb).
Thinking about it, might be simpler to start with the just base Dada postmodernism grammar (example: http://www.elsewhere.org/journal/pomo/). It’s good enough for now (ie simple text) and we can expand on it (eg adding scientific text, figures).
Played around over the weekend and I think the pomo
grammar is harder to get started with as it covers less things. Spoke with @BlueReZZ about the SCIGen license and we'll try contacting them to see if it'd be ok to redistribute under MIT (it'll be a major modification, and ultimately extension, of their grammar file).
Problem / Motivation
For development, testing and demoing purposes we need a range of content available to use (covering all the possibilites). 'Lorem ipsum' is easy to generate, but hard to use/read (pseudo content is far more useful). For eLife we hand-crafted articles based on real work (eg https://demo--journal.elifesciences.org/, https://ui-patterns.elifesciences.org/?p=pages-article--research). For Libero we have so far been doing the same thing (eg http://unstable.libero.pub/articles/article1, http://unstable--pattern-library.libero.pub/?p=pages-research-article-en). This is hard to do in terms of quickly creating both quality and quantity. Use real information potentially problematic too (eg need to be careful about referring to real people).
Proposed solution
Have a way to generate JATS+Libero content that appears real, but is in fact generated, and use this everywhere.
The Dada Engine, despite being pretty old, provides a pretty flexible way of generating content. The grammar in SCIGen is comprehensive (but produces computer science content). After some experimentation it's not too difficult to rewrite the SCIGen grammar in the pb format.
Clarification needed and assumptions