Closed fmap closed 10 years ago
I think I see; comparing with the structure of the page, ["/F2 11 Tf".."Tf\n"]
delimits the region between the URL and the T&C message:
>>> doc.getobj(19).get_data()
'q\n\nq\nBT\n36 806 Td\nET\nQ\nq\n0 0 0 RG\n/P <</MCID 0>> BDC\nq\n0 0 0 RG\n/Figure <</MCID 0>> BDC\nq 220 0 0 91 50 671 cm /img0 Do Q\nQ\nEMC\nBT\n1 0 0 1 50 643 Tm\n/F1 12 Tf\n()Tj\nET\n0.5 w\n50 633 m\n562 633 l\nS\nBT\n1 0 0 1 50 610 Tm\n/F2 11 Tf\n(\x000\x00H\x00P\x00R\x00L\x00U\x00\x0f\x00\x03\x006\x00R\x00F\x00L\x00D\x00O\x00\x03\x00+\x00L\x00V\x00W\x00R\x00U\x00\\\\\x00\x03\x00D\x00Q\x00G\x00\x03\x00&\x00R\x00P\x00P\x00L\x00W\x00P\x00H\x00Q\x00W\x00\x1d\x00\x03\x00\\(\x00U\x00L\x00F\x00\x03\x00+\x00R\x00E\x00V\x00E\x00D\x00Z\x00P\x00\\n\x00V\x00\x03\x00\x05\x00,\x00Q\x00W\x00H\x00U\x00H\x00V\x00W\x00L\x00Q\x00J\x00\x03\x007\x00L\x00P\x00H\x00V\x00\x05)Tj\nET\nBT\n1 0 0 1 50 597 Tm\n/F2 11 Tf\n(\x00$\x00X\x00W\x00K\x00R\x00U\x00\x0b\x00V\x00\\f\x00\x1d\x00\x03)Tj\n(\x00-\x00D\x00P\x00H\x00V\x00\x03\x00\\(\x00\x11\x00\x03\x00&\x00U\x00R\x00Q\x00L\x00Q)Tj\nET\nBT\n1 0 0 1 49 584 Tm\n/F2 11 Tf\n(\x005\x00H\x00Y\x00L\x00H\x00Z\x00H\x00G\x00\x03\x00Z\x00R\x00U\x00N\x00\x0b\x00V\x00\\f\x00\x1d)Tj\nET\nBT\n1 0 0 1 50 571 Tm\n/F2 11 Tf\n(\x006\x00R\x00X\x00U\x00F\x00H\x00\x1d\x00\x03)Tj\n1 0 0.21256 1 91.76 571 Tm\n(\x00-\x00R\x00X\x00U\x00Q\x00D\x00O\x00\x03\x00R\x00I\x00\x03\x006\x00R\x00F\x00L\x00D\x00O\x00\x03\x00+\x00L\x00V\x00W\x00R\x00U\x00\\\\\x00\x0f\x00\x03)Tj\n1 0 0 1 236.67 571 Tm\n(\x009\x00R\x00O\x00\x11\x00\x03\x00\x16\x00\x1a\x00\x0f\x00\x03\x001\x00R\x00\x11\x00\x03\x00\x14\x00\x0f\x00\x03\x006\x00S\x00H\x00F\x00L\x00D\x00O\x00\x03\x00,\x00V\x00V\x00X\x00H\x00\x03\x00\x0b\x00$\x00X\x00W\x00X\x00P\x00Q\x00\x0f\x00\x03\x00\x15\x00\x13\x00\x13\x00\x16\x00\\f\x00\x0f\x00\x03\x00S\x00S\x00\x11\x00\x03\x00\x15\x00\x14\x00\x1c\x00\x10\x00\x15\x00\x16\x00\x14)Tj\n-186.67 0 Td\nET\n0 0 1 RG\n0.73333 w\n126.14 554.33 m\n233.07 554.33 l\nS\n0 G\n1 w\nBT\n1 0 0 1 50 558 Tm\n/F2 11 Tf\n(\x003\x00X\x00E\x00O\x00L\x00V\x00K\x00H\x00G\x00\x03\x00E\x00\\\\\x00\x1d\x00\x03)Tj\n/F3 11 Tf\n0 0 1 rg\n(Oxford University Press)Tj\n0 g\nET\n0 0 1 RG\n0.73333 w\n115.27 541.33 m\n275.39 541.33 l\nS\n0 G\n1 w\n1 1 1 rg\n275.39 542.61 5.5 9.9 re\nf\n0 g\nBT\n1 0 0 1 50 545 Tm\n/F2 11 Tf\n(\x006\x00W\x00D\x00E\x00O\x00H\x00\x03\x008\x005\x00/\x00\x1d\x00\x03)Tj\n/F3 11 Tf\n0 0 1 rg\n(http://www.jstor.org/stable/3790325)Tj\n0 g\n1 1 1 rg\n( .)Tj\n0 g\nET\nBT\n1 0 0 1 50 529 Tm\n/F2 11 Tf\n(\x00$\x00F\x00F\x00H\x00V\x00V\x00H\x00G\x00\x1d\x00\x03\x00\x13\x00\x14\x00\x12\x00\x13\x00\x15\x00\x12\x00\x15\x00\x13\x00\x14\x00\x16\x00\x03\x00\x14\x00\x1b\x00\x1d\x00\x18\x00\x15)Tj\nET\n0.5 w\n50 519 m\n562 519 l\nS\n1 1 1 rg\n471.03 494.61 5.5 9.9 re\nf\n0 g\n0 0 1 RG\n0.66667 w\n50 481.67 m\n270.28 481.67 l\nS\n0 G\n1 w\n1 1 1 rg\n50 458.61 5.5 9.9 re\nf\n0 g\nBT\n1 0 0 1 50 497 Tm\n/F3 10 Tf\n(Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at)Tj\n/F3 11 Tf\n1 1 1 rg\n( .)Tj\n0 g\n1 0 0 1 50 485 Tm\n/F3 10 Tf\n0 0 1 rg\n(http://www.jstor.org/page/info/about/policies/terms.jsp)Tj\n0 g\n1 0 0 1 50 473 Tm\n()Tj\n1 0 0 1 50 461 Tm\n/F3 11 Tf\n1 1 1 rg\n( .)Tj\n0 g\nET\n1 1 1 rg\n50 398.61 5.5 9.9 re\nf\n0 g\nBT\n1 0 0 1 50 449 Tm\n/F3 10 Tf\n(JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of)Tj\n1 0 0 1 50 437 Tm\n(content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms)Tj\n1 0 0 1 50 425 Tm\n(of scholarship. For more information about JSTOR, please contact support@jstor.org.)Tj\n1 0 0 1 50 413 Tm\n()Tj\n1 0 0 1 50 401 Tm\n/F3 11 Tf\n1 1 1 rg\n( .)Tj\n0 g\nET\nq\n0 0 0 RG\n/Figure <</MCID 0>> BDC\nq 60 0 0 65.59 50 50 cm /img1 Do Q\nQ\nEMC\nBT\n1 0 0 1 115 105 Tm\n/F4 10 Tf\n(Oxford University Press)Tj\n/F3 10 Tf\n( is collaborating with JSTOR to digitize, preserve and extend access to )Tj\n/F4 10 Tf\n(Journal of)Tj\n1 0 0 1 115 95 Tm\n(Social History.)Tj\nET\nBT\n0 Tr\n/F3 10 Tf\n1 0 0 1 50 40 Tm\n(http://www.jstor.org )Tj\nET\nQ\nEMC\n\n Q\nq\nq\n1 1 1 rg\n0 -36 595 36 re\nf\nQ\nq\n2 J\n0 G\nQ\n0 0 1 RG\n0.53333 w\n278.05 -26.67 m\n374.72 -26.67 l\nS\n0 G\n1 w\nBT\n1 0 0 1 0 -8 Tm\n/Xi0 8 Tf\n()Tj\n1 0 0 1 203.39 -16 Tm\n(This content downloaded on Fri, 1 Feb 2013 18:52:40 PM)Tj\n1 0 0 1 220.28 -24 Tm\n(All use subject to )Tj\n0 0 1 rg\n(JSTOR Terms and Conditions)Tj\n0 g\nET\nQ\n'
>>>
What's the procedure for identifying symbol-referenced (?) text within documents? Matching the content text isn't working in this case:
..but delimiters seem to've been identified before, same example: