mrmiguez / pymods

process MODS records from Python
https://pypi.python.org/pypi/pymods
MIT License
18 stars 0 forks source link

use MODSReader on string-Objects #15

Open alexander-winkler opened 3 years ago

alexander-winkler commented 3 years ago

Hello!

I'm trying to apply the MODSReader not to a xml-file (as in the examples provided) but rather on requests.get-responses I've tried transforming the xml-string into a file-like object using io.StringIO (which would be the usual way to deal with the issue in etree, I guess), but I'm getting a ValueError:

  File "mods_parse.py", line 6, in <module>
    MODSReader(io.StringIO(request_opac("pica.sys=j2017").text))
  File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 58, in __init__
    super(MODSReader, self).__init__(file_location, '{0}mods'.format(NAMESPACES['mods']), parser=mods_parser)
  File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 27, in __init__
    self.iterator = parse(file_location, parser=parser).iter(iter_elem)
  File "/home/alex/.local/lib/python3.6/site-packages/pymods/reader.py", line 8, in parse
    return etree.parse(source, parser=parser)
  File "src/lxml/etree.pyx", line 3469, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1856, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1871, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

Could you suggest me a way to pipe the xml-string directly into the parser?

Thank you very much!

mrmiguez commented 3 years ago

Hi Alexander,

I've always meant to make the pymods parser more flexible to various inputs. My workflows have always involved local XML files, so I never got around to implementing that feature. I'm happy to hear that someone else is using pymods, so that will bump the priority up for me a bit.

My family just welcomed our first child recently, so unfortunately I don't have much time to work on this at the moment. If you're comfortable submitting a PR implementing string parsing, I'll consider it for merging. Otherwise, it might be a little bit until I'm back in the office and ready to spend time on this.

If you need a short-term solution, you can write out requests.get(<your request url>).text and pass that to the parser. If you're working with OAI-PMH data, I've had a lot of success with with Mark Phillips' pyoaiharvester. It's a python2 utility, but it's very helpful at getting OAI-PMH data where you can use it.

Best, -MM