Open rossmounce opened 9 years ago
PDF2Text from PDFBox is pure Java so reliable to run. All *.exe's have the problem that they are forked processes and may give problems such as buffer overrun. It's generally more work to run these.
Also when we come to transport the software either the installer also has to install all these codes, or has to resort to JNI which has given us problems in the past.
PDF2text of any variety should only be used for words, not for sentences. (What happened to the OPEN ACCESS box under Poppler?) and how much slower is slower?
On Wed, Jul 8, 2015 at 4:54 PM, Ross Mounce notifications@github.com wrote:
Output from pdf2text transform (from norma) is reasonable but not brilliant. It's also quite slow tbh. Given pdftotext (Poppler) is already pre-installed in the workshop VM, perhaps better just to call it?
e.g. for this PeerJ article PDF: https://peerj.com/articles/900/
Compare output from pdftotext (Poppler):
ABSTRACT
Submitted 10 December 2014 Accepted 30 March 2015 Published 16 April 2015 Corresponding authors Matthew B. Hufford,mhufford@iastate.edu Jeffrey Ross-Ibarra,rossibarra@ucdavis.edu Academic editor Todd Vision Additional Information and Declarations can be found on page 16
The teosinte branched1(tb1) gene is a major QTL controlling branching differences between maize and its wild progenitor, teosinte. The insertion of a transposable element (Hopscotch) upstream of tb1 is known to enhance the gene’s expression, causing reduced tillering in maize. Observations of the maize tb1 allele in teosinte and estimates of an insertion age of the Hopscotch that predates domestication led us to investigate its prevalence and potential role in teosinte. We assessed the prevalence of the Hopscotch element across an Americas-wide sample of 837 maize and teosinte individuals using a co-dominant PCR assay. Additionally, we calculated population genetic summaries using sequence data from a subset of individuals from four teosinte populations and collected phenotypic data using seed from a single teosinte population where Hopscotch was found segregating at high frequency. Genotyping results indicate the Hopscotch element is found in a number of teosinte populations and linkage disequilibrium near tb1 does not support recent introgression from maize. Population genetic signatures are consistent with selection on the tb1 locus, revealing a potential ecological role, but a greenhouse experiment does not detect a strong association between the Hopscotch and tillering in teosinte. Our findings
to output from pdf2text (norma --transform), which unfortunately muddles the order of lines:
ABSTRACT The teosinte branched1(tb1) gene is a major QTL controlling branching differences between maize and its wild progenitor, teosinte. The insertion of a transposable element (Hopscotch) upstream of tb1 is known to enhance the gene’s expression, causing reduced tillering in maize. Observations of the maize tb1 allele in teosinte and estimates of an insertion age of the Hopscotch that predates domestication led us to investigate its prevalence and potential role in teosinte. We assessed the prevalence of the Hopscotch element across an Americas-wide sample of 837 maize and teosinte individuals using a co-dominant PCR assay. Additionally, we calculated population genetic summaries using sequence data from a subset of individuals from four teosinte populations and collected phenotypic data using seed from a single teosinte population where Hopscotch was found segregating at high frequency. Genotyping results indicate the Hopscotch element is found in a number of teosinte populations and linkage disequilibrium near tb1 does not support recent introgression from Submitted 10 December 2014 maize. Population genetic signatures are consistent with selection on the tb1 locus, Accepted 30 March 2015 revealing a potential ecological role, but a greenhouse experiment does not detect Published 16 April 2015 a strong association between the Hopscotch and tillering in teosinte. Our findings Corresponding authors Matthew B. Hu ord, suggest the role of Hopscotch differs between maize and teosinte. Future work shouldffmhufford@iastate.edu assess tb1 expression levels in teosinte with and without the Hopscotch and more Jeffrey Ross-Ibarra, comprehensively phenotype teosinte to assess the ecological significance of therossibarra@ucdavis.edu Hopscotch insertion and, more broadly, the tb1 locus in teosinte. Academic editor Todd Vision Subjects Agricultural Science, Ecology, Evolutionary Studies, Genetics Additional Information and Keywords Transposable element, Domestication, Teosinte, Teosinte branched1, Maize Declarations can be found on page 16 DOI 10.7717/peerj.900 INTRODUCTION Copyright Domesticated crops and their wild progenitors provide an excellent system in which 2015 Vann et al. to study adaptation and genomic changes associated with human-mediated selection Distributed under (Ross-Ibarra, Morrell & Gaut, 2007). Plant domestication usually involves a suite of Creative Commons CC-BY 4.0 phenotypic changes such as loss of seed shattering and increased fruit or grain size, which OPEN ACCESS are commonly referred to as the ‘domestication syndrome’ (Olsen & Wendel, 2013), and
— Reply to this email directly or view it on GitHub https://github.com/petermr/norma/issues/15.
Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069
Output from pdf2text transform (from norma) is reasonable but not brilliant. It's also quite slow tbh. Given pdftotext (Poppler) is already pre-installed in the workshop VM, perhaps better just to call it?
e.g. for this PeerJ article PDF: https://peerj.com/articles/900/
Compare output from pdftotext (Poppler):
to output from pdf2text (norma --transform), which unfortunately muddles the order of lines: