nzhiltsov / mocassin

Automatically exported from code.google.com/p/mocassin
0 stars 0 forks source link

Reorganize the indexing process workflow #70

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
It should be executed as follows:
1. Given an arXiv id.
2. Download the paper metadata and the Latex source via ArXiv API.
3. Patch the Latex source with additional package entries (pdfsync, framed 
etc.).
4. Produce the arxmliv representation (XML based) using the 'latexml' tools.
5. Add the XML document into a GATE storage and process it with the GATE 
machinery.
6. Extract structural elements etc.
7. PDF compilation: pdflatex -> the main PDF document; get page numbers for 
structural elements using pdfsync; for each structural element -> patch the 
Latex source file with 'shaded' entries & generate the highlighted PDF document 
(pdflatex).
8. Generate RDF metadata of structural elements and populate the RDF store.

Original issue reported on code.google.com by nikita.z...@gmail.com on 18 Jul 2011 at 1:51

GoogleCodeExporter commented 9 years ago

Original comment by nikita.z...@gmail.com on 26 Jul 2011 at 11:31