iDigBio / idb-backend

iDigBio server and backend code for data ingestion, media processing, record indexing, and data API.
GNU General Public License v3.0
7 stars 1 forks source link

mediaing get-media str vs bytes issue #157

Open danstoner opened 3 years ago

danstoner commented 3 years ago
2021-04-10 12:10:56.736 ERROR idb.mediaing჻ Unhandled error processing: https://herbarium.nkn.uidaho.edu/images/jpeg.php?Image=ID003067.jpg
Traceback (most recent call last):
  File "/home/dan/git/IDIGBIO/idb-backend/idigbio_ingestion/mediaing/fetcher.py", line 273, in get_media
    self.validate()
  File "/home/dan/git/IDIGBIO/idb-backend/idigbio_ingestion/mediaing/fetcher.py", line 335, in validate
    url=self.url, type=self.type, mime=self.mime)
  File "/home/dan/git/IDIGBIO/idb-backend/idb/postgres_backend/db.py", line 687, in frombuff
    return cls.fromobj(StringIO(buff), **attrs)
TypeError: initial_value must be str or None, not bytes
danstoner commented 3 years ago

Reproduce with a command such as:

$ idigbio-ingestion -v  mediaing --prefix 'http://www.tropicos.org' get-media
2021-04-11 13:23:14.251 INFO  idb.mediaing჻ Started fetcher with User Agent string: 'iDigBio Media Ingestor (idigbio@acis.ufl.edu https://www.idigbio.org/wiki/index.php/Media_Ingestor)'
2021-04-11 13:23:14.252 INFO  idb.mediaing჻ mediaing - run once mode
2021-04-11 13:23:28.044 INFO  idb.mediaing჻ Found 88577 urls to check
2021-04-11 13:23:28.227 INFO  idb.gevent_helpers჻ Closing all connections: 1
2021-04-11 13:23:28.675 ERROR idb.mediaing჻ Unhandled error processing: http://www.tropicos.org/ImageDownload.aspx?imageid=100000003
Traceback (most recent call last):
  File "/home/dan/git/IDIGBIO/idb-backend/idigbio_ingestion/mediaing/fetcher.py", line 272, in get_media
    self.validate()
  File "/home/dan/git/IDIGBIO/idb-backend/idigbio_ingestion/mediaing/fetcher.py", line 334, in validate
    url=self.url, type=self.type, mime=self.mime)
  File "/home/dan/git/IDIGBIO/idb-backend/idb/postgres_backend/db.py", line 687, in frombuff
    return cls.fromobj(StringIO(buff), **attrs)
TypeError: initial_value must be str or None, not bytes
^CKeyboardInterrupt
2021-04-11T17:23:30Z

Aborted!
danstoner commented 3 years ago

Can't test at the moment due to infra issues.