Open holtzermann17 opened 11 years ago
My first comment is that the detailed roadmap is great! This preview seems ready to be converted into its own set of issues in an issue tracker, with work commencing whenever we're ready for that. Indeed, some indication of the progress made along the roadmap would help contributors get motivated about getting involved.
This is also quite reminiscent of the Seed Projects that we've written about in the Free Technology Guild project, see this page. The FTG seed projects use a slightly different but almost analogous template. Again, I think the fact that you've already broken the roadmap down to detailed do-able steps is a big advantage, and I'd suggest that other Seed Projects use this as a model.
To conclude: I would see the PM Previews series as being parallel to the FTG's incubator function. If we can make the other high-level summaries I advanced in #34 into similarly-detailed outlines, I think we'll have a very nice map for ourselves and any others who would like to join.
Thanks very much for contributing the model seed project @rspuzio!
Some issues directly related to books: https://github.com/KWARC/planetary/issues/340, https://github.com/KWARC/planetary/issues/332, https://github.com/KWARC/planetary/issues/336, https://github.com/KWARC/planetary/issues/341
One issue more related to collections: https://github.com/KWARC/planetary/issues/216
General improvements related to Git integration and a build system would probably be useful here: https://github.com/KWARC/planetary/issues/68, https://github.com/KWARC/planetary/issues/67
Then there are a bunch of OCR- and proofreading-related issues that we need to outline (some of that may also be relevant to Planetary, but other bits should go elsewhere).
Here is an example of a preview for the book project.
Summary
Lifecycle stage: 1 --proto--[ 2 ] ->- 3 --evolving-- 4 ->- 5 --complete-- 6 ->- 7 --mature-- 8
Sources: (a) Retrodigitization, (b) importing content from other CC-By-SA or more liberally licensed sources, (c) re-using internal content.
In more detail:
We have done a lot of background research on this. In order to get things moving and progress to Stage 3, we need some hands-on-the keyboard time (mathematics background helpful throughout, programming experience helpful for b and c). By default, this will evolve slowly as we assemble new courses and make further experiments with NNexus. However, an influx of vounteer time (or funding) could make things progress more rapidly.
MOCK-UP/DEMO
Put in some artistic impression of what the books section might look like.
DETAILED DESCRIPTION
The purpose of the book project is to make mathematical books in the public domain accessible to the general public in the form of a collaborative digital library. To accomplish this goal, we plan to design and build a system comprised of three interoperating components.
The first subsystem is a retrodigitization toolchain. When complete, this system will allow one to start with a phyiscal book on a library shelf, scan it in to a computer, then subject the result to a series of processing steps which result in a TeX representaion of the book's contents. While the software for this already exists and has been tested, there is room for improvement; by introducing image preprocessing, clustering, and postprocessing, one should be able to significantly improve the accuracy of the process. Given that proofreading and correcting errors is a labor-intensive process, the labor saved by improving the OCR process justifies the effort.
Since, even with these improvements, this process is not 100% accurate, we need the next component, which is an editorial workflow. Based upon the CBPP approach which has been in use for the last decade to produce the PM encyclopaedia and inspired by predecessors such as the St. Pachomius Library and Project Gutenberg's Distributed Proofreaders, this system will coordinate the proofreading of mathematical works by members of the PM community. To participate, a member would start at a page which lists the various works which have been processed but not yet proofread. Upon picking a work, the member would be assigned a page. To work on the page, there would be a webpage which displays the original text, the computer output from the OCR suite, and the rendering of that output. The proofreader's job is to ensure that the rendered output agrees with the original text and, if not, to edit the output as appropriate. Once this is done, an editor will double-check the result and, once all pages have been satisfactorily edited, the system will collect the results and collate them into a hypertext edition.
The third and final component is a reading room which makes the results available to the reading public. To locate books, there will be a catalogue, search facility, and recommender. Once one has located a book, one can read it in several forms. The primary form is hypertext enhanced with links to the encyclopaedia, cross links to other books, notes, reviews, problem solutions, and the like. There will also be files of the book available for downloading and viewing on an e-book reader or printing out. In line with the philosophy of library as a social space, there will be plenty of opportunities for readers to interact with the text and each other by making notes, reviewing books, and participating in discussions.
In addition to these three components, there will also be an area for supporting the project and the PlanetMath organization by sponsoring books and purchasing hard copies.
ROADMAP
HOW TO HELP
If you're a philanthropist, your donations will help move the research and development process along:
If you're a Drupal dude, you can help implement the proofreading facilities and reading room.
If you're a script kiddie, you can help us build our toolchain.
If you're into statistics, you can help us with identifying characters by clustering.
If you're an proofreader, you can help us prepare the first few texts.
ACKNOWLEDGEMENTS
Thank people who have helped with the initial steps in the roadmap.