iDigBio / idb-backend

iDigBio server and backend code for data ingestion, media processing, record indexing, and data API.
GNU General Public License v3.0
7 stars 0 forks source link

recordsets table gets file_harvest_date but no file_harvest_etag #8

Closed danstoner closed 8 years ago

danstoner commented 8 years ago

After running update_publisher_recordset.py, the database contains a file_harvest_date timestamp but no etag for the actual data file.

idb_api_prod=> select * from recordsets where file_link like '%smtp%';
  id  |                 uuid                 |            publisher_uuid            |                                   name                                   |               recordids                |                 eml_link                 |                  file_link                   | ingest |         first_seen         |         last_seen          |      pub_date       |     file_harvest_date      | file_harvest_etag |      eml_harvest_date      |         eml_harvest_etag         
------+--------------------------------------+--------------------------------------+--------------------------------------------------------------------------+----------------------------------------+------------------------------------------+----------------------------------------------+--------+----------------------------+----------------------------+---------------------+----------------------------+-------------------+----------------------------+----------------------------------
 4125 | 2ccbe8d3-c688-4c20-bf24-68a5ef486519 | baaa8f8c-4a78-4dcf-b207-25289a6d533d | Swedish Malaise Trap Project (SMTP) Collection Inventory - Version 27.42 | {38c1351d-9cfe-42c0-97da-02d2c8be141c} | http://www.gbif.se/ipt/eml.do?r=smtp-nrm | http://www.gbif.se/ipt/archive.do?r=smtp-nrm | t      | 2016-02-17 14:28:00.248765 | 2016-04-11 20:21:03.565575 | 2016-04-11 13:26:30 | 2016-04-11 18:23:01.531572 | NULL              | 2016-04-11 14:21:03.460929 | a19b3755595df2dd1a2d8843a9c741ef
(1 row)

update_publisher_recordset.py output:

2016-04-11 21:15:24 INFO  [idigbio] Publisher Feed: baaa8f8c-4a78-4dcf-b207-25289a6d533d http://www.gbif.se/ipt/rss.do
2016-04-11 21:15:24 INFO  [idigbio] Update Publisher id:133 baaa8f8c-4a78-4dcf-b207-25289a6d533d IPT GBIF-Sweden
2016-04-11 21:15:24 INFO  [idigbio] Update Recordset id:4125 2ccbe8d3-c688-4c20-bf24-68a5ef486519 http://www.gbif.se/ipt/archive.do?r=smtp-nrm Swedish Malaise Trap Project (SMTP) Collection Inventory - Version 27.96
...
2016-04-11 21:15:24 INFO  [idigbio] Finished processing RSS

No Harvest lines present at the end of output that correspond to this recordset.

danstoner commented 8 years ago

Added output to code, this may help determine cause that leaves database in inconsistent state.

2016-04-11 21:49:49 INFO  [idigbio] Harvest File 4125 Swedish Malaise Trap Project (SMTP) Collection Inventory - Version 27.42
http://www.gbif.se/ipt/archive.do?r=smtp-nrm {}
2016-04-11 21:49:52 DEBUG [idigbio] Starting Upload of '2ccbe8d3-c688-4c20-bf24-68a5ef486519'
2016-04-11 21:49:52 DEBUG [idigbio] ETAG a9550622b43e25fe679949b59fae99e1 already present in Storage. Failed recordset 2ccbe8d3-c688-4c20-bf24-68a5ef486519
2016-04-11 21:49:52 DEBUG [idigbio] Finished Upload of '2ccbe8d3-c688-4c20-bf24-68a5ef486519'

That section of code had no exception trapping. I added basic try except but upload_recordset() needs stronger checks and conditions.

https://github.com/iDigBio/idb-backend/blob/bbc504dafe4f6065f9a38fd7f01fd9af006d089a/idigbio_ingestion/update_publisher_recordset.py#L354

godfoder commented 8 years ago

More logging is good, but that looks like it just skips the upload and is otherwise successful. Skipping the upload because the key already exists isn't a "failure".

UnwashedMeme commented 8 years ago

5a2f5b9 should fix the logic bug; i'm going to test that by triggering a harvest of 2ccbe8d3-c688-4c20-bf24-68a5ef486519 mentioned at the top of the ticket

UnwashedMeme commented 8 years ago
# r2ccb is a DictCursor from fetching the recordset with uuid 2ccb....
In [28]: update_publisher_recordset.harvest_file(r2ccb, idbmodel)
10:29:46 INFO    idigbio             | Harvest File 4125 Swedish Malaise Trap Project (SMTP) Collection Inventory - Version 27.42
http://www.gbif.se/ipt/archive.do?r=smtp-nrm {}
10:29:48 DEBUG   idigbio             | Starting Upload of '2ccbe8d3-c688-4c20-bf24-68a5ef486519'
10:29:49 DEBUG   idigbio             | ETAG ed0e6df9e420a598cb8c6915be5db31e already present in Storage. Failed recordset 2ccbe8d3-c688-4c20-bf24-68a5ef486519

In [29]: sql = "SELECT * FROM recordsets WHERE uuid = '2ccbe8d3-c688-4c20-bf24-68a5ef486519'"

In [30]: idbmodel.fetchone(sql)
Out[30]: 
[4125L,
 '2ccbe8d3-c688-4c20-bf24-68a5ef486519',
 'baaa8f8c-4a78-4dcf-b207-25289a6d533d',
 u'Swedish Malaise Trap Project (SMTP) Collection Inventory - Version 27.42',
 [u'38c1351d-9cfe-42c0-97da-02d2c8be141c'],
 u'http://www.gbif.se/ipt/eml.do?r=smtp-nrm',
 u'http://www.gbif.se/ipt/archive.do?r=smtp-nrm',
 True,
 datetime.datetime(2016, 2, 17, 14, 28, 0, 248765),
 datetime.datetime(2016, 4, 12, 9, 21, 0, 966324),
 datetime.datetime(2016, 4, 12, 9, 6, 0, 966295),
 datetime.datetime(2016, 4, 12, 10, 29, 49, 25612),
 u'ed0e6df9e420a598cb8c6915be5db31e',
 datetime.datetime(2016, 4, 12, 9, 21, 2, 677087),
 u'4127a9147a18cf3d80b33356718434b8']

In [32]: idbmodel.fetchone("SELECT * FROM objects where etag = 'ed0e6df9e420a598cb8c6915be5db31e'")
Out[32]: 
[12794165L,
 u'datasets',
 u'ed0e6df9e420a598cb8c6915be5db31e',
 u'application/zip',
 False]

In [33]: idbmodel.fetchone("SELECT * FROM media_objects WHERE etag = 'ed0e6df9e420a598cb8c6915be5db31e'")
Out[33]: 
[15222961L,
 u'http://api.idigbio.org/v1/recordsets/2ccbe8d3-c688-4c20-bf24-68a5ef486519',
 u'ed0e6df9e420a598cb8c6915be5db31e',
 datetime.datetime(2016, 4, 12, 9, 21, 11, 374918)]

In [34]: idbmodel.fetchone("SELECT * FROM media WHERE url like 'http://api.idigbio.org/v1/recordsets/2ccbe8d3-c688-4c20-bf24-68a5ef486519'")
Out[34]: 
[15548211L,
 u'http://api.idigbio.org/v1/recordsets/2ccbe8d3-c688-4c20-bf24-68a5ef486519',
 u'datasets',
 u'application/zip',
 200,
 datetime.datetime(2016, 4, 12, 10, 29, 49, 18489),
 '872733a2-67a3-4c54-aa76-862735a5f334']

We downloaded the file, tried uploading it to ceph, found it was already there returned the etag to harvest_file which updated the recordset row, now everything looks kosher.

UnwashedMeme commented 8 years ago
In [35]:     sql = """SELECT *
             FROM recordsets
             WHERE file_link IS NOT NULL
               AND uuid IS NOT NULL
               AND ingest=true
               AND pub_date < now()
               AND (file_harvest_date IS NOT NULL AND file_harvest_etag IS NULL)"""

   ....:    ....:    ....:    ....:    ....:    ....: 
In [36]: 
In [36]: missingetags = idbmodel.fetchall(sql)

In [37]: len(missingetags)
Out[37]: 43

In [38]: min([r['file_harvest_date'] for r in missingetags])
Out[38]: datetime.datetime(2016, 4, 8, 10, 42, 38, 814133)

In [39]: max([r['file_harvest_date'] for r in missingetags])
Out[39]: datetime.datetime(2016, 4, 12, 9, 21, 9, 853452)

In [40]: [(r['uuid'],r['name']) for r in missingetags]
Out[40]: 
[('552ce2e5-b627-4d6d-b914-6b495d0a79e6', u'UAZ Mammals - Version 5'),
 ('59422682-15ba-47e1-99e2-1ef69f7bdd9a',
  u'Entomological Collections (NHRS), Swedish Museum of Natural History (NRM) - Version 26.44'),
 ('b8cbed64-5126-46bd-97aa-43627743aba7',
  u'CAS Ornithology (ORN) - Version 107'),
 ('01dfe0f4-24fe-447e-9f8f-1db7f8394b89',
  u'Lund Museum of Zoology - Insect collections (MZLU) - Version 367.42'),
 ('781fd581-7b93-471e-a025-413e4bcd8491',
  u'University of Florida Herbarium (FLAS) - Version 11.3'),
 ('1bc74afb-698f-43a7-90e6-352dba6c74da',
  u'RBdna - Rio de Janeiro Botanical Garden DNA Collection - Version 7.29'),
 ('953b0329-c3e4-4816-a038-7afbd2bb2547',
  u'RB - Rio de Janeiro Botanical Garden Herbarium Collection - Version 84.21'),
 ('5e2f4c81-8c8a-45f3-a220-851f85f86b40',
  u'VIES - Herb\xe1rio Central da Universidade Federal do Esp\xedrito Santo - Version 1.5'),
 ('d3412433-4df9-4828-89e0-73956898f749',
  u'Iowa State University Digitized Collection'),
 ('81dc7cdb-66be-4683-ae79-068a784378b1',
  u'University of Missouri Digitized Collection'),
 ('b761d317-a36e-4a05-a5f4-bd3e3963daf6',
  u'University of Wisconsin Digitized Collection'),
 ('1d14acd1-20ef-4a55-8206-f04c8a75ea3e',
  u'University of Wisconsin Oshkosh Digitized Collection'),
 ('021e2617-7532-4cef-806c-690bed32ab84', u'NRM-Fishes - Version 43.12'),
 ('833306f7-91b6-4ff7-bc16-0e406334d991',
  u'University of Minnesota Digitized Collection'),
 ('f778ecc0-8371-49d5-9ab1-9d75f0b76fad',
  u'Lund Botanical Museum (LD) - Version 362.81'),
 ('652ea450-af13-4334-96ff-3136d0188778',
  u'Palaeozooloical Collections (PZ), Swedish Museum of Natural History (NRM) - Version 26.60'),
 ('196c4f1c-53f9-480f-a012-dc0522629047',
  u'Michigan State University Digitized Collection'),
 ('c1122f57-9ab9-4552-9393-7d56b0bbe852',
  u'Ohio State University Digitized Collection'),
 ('26f7cbde-fbcb-4500-80a9-a99daa0ead9d', u'CAS Botany (BOT) - Version 135'),
 ('ded380b5-1ba2-4089-8e0c-0aa1b4140785', u'AntWeb - Version 8'),
 ('14a8f79f-eab7-48da-ad50-bda142703820',
  u'CAS Mammalogy (MAM) - Version 111'),
 ('beb74dc2-22ea-49e4-b1e3-bedb8e06e8f2',
  u'CAS Ichthyology (ICH) - Version 128'),
 ('47ac1531-5213-4848-a32d-5bb396ab9348',
  u'Purdue University Digitized Collection'),
 ('57b1a2a3-78ab-4e69-a77e-a8fd4394ee5a',
  u'University of Illinois at Urbana-Champaign Digitized Collection'),
 ('2e65e24b-b7e2-40a4-a40c-09edafc1e3f4',
  u'University of Kansas Digitized Collection'),
 ('4b05f088-74a4-44a5-a161-8b1484efc240',
  u'RBcarpo - Rio de Janeiro Botanical Garden Dry Fruits Collection - Version 8.29'),
 ('2bfc480c-e5b3-4a9b-9587-a92c22830ace',
  u'RBw - Rio de Janeiro Botanical Garden Wood Collection - Version 8.28'),
 ('fe51bced-93ce-45b2-b0c6-f7256719a07b',
  u'CVRD - Herb\xe1rio da Reserva Natural Vale - Version 1.7'),
 ('45544aa4-8762-4bf0-bfc6-890d08dc6ead',
  u'Illinois Natural History Survey Fish Collection - Version 3.0'),
 ('be34dbd9-5d54-4837-9f49-ff423eb18e8b', u'ILLS-HERP DwC-Archive'),
 ('30ab9c2a-0b54-4c04-84ca-bc7abdd90b52',
  u'Vertebrate Zoology Division - Ichthyology, Yale Peabody Museum - Version 228'),
 ('76015dea-c909-4e6d-a8e1-3bf35763571e',
  u'Vertebrate Zoology Division - Mammalogy, Yale Peabody Museum - Version 230'),
 ('06c35934-1b75-4196-838d-29d509951bf9',
  u'Tall Timbers Research Station and Land Conservancy - Version 11'),
 ('7ae4d15d-62e2-459b-842a-446f921b9d3f',
  u'Paleobotany Division, Yale Peabody Museum - Version 228'),
 ('cf60ed8a-2c79-4b85-a259-15a8e216dae4',
  u'Vertebrate Zoology Division - Herpetology, Yale Peabody Museum - Version 228'),
 ('6b5e29d3-b462-44d8-ba38-d68af5088067',
  u'Entomology Division, Yale Peabody Museum - Version 228'),
 ('8fc08919-1137-42e4-9fa5-9e64f1e5757b',
  u'Vertebrate Zoology Division - Ornithology, Yale Peabody Museum - Version 228'),
 ('b5f4526b-f4fb-4d90-8ce0-975e0cda8ff6',
  u'Invertebrate Zoology Division, Yale Peabody Museum - Version 228'),
 ('1527b668-b797-42be-94d3-0058e1393e94',
  u'Botany Division, Yale Peabody Museum - Version 229'),
 ('0220907a-0463-4ae0-8a0b-77f5e80fff40',
  u'Vertebrate Paleontology Division, Yale Peabody Museum - Version 228'),
 ('271a9ce9-c6d3-4b63-a722-cb0adc48863f',
  u'Museum of Comparative Zoology, Harvard University - Version 147'),
 ('a68df423-aae9-4f4b-8a42-a36124627a53',
  u'Essig Museum of Entomology - Version 29'),
 ('137ed4cd-5172-45a5-acdb-8e1de9a64e32',
  u'Invertebrate Paleontology Division, Yale Peabody Museum - Version 228')]

I think we just want to trigger harvest_file on each of those results like i did for 2ccb

UnwashedMeme commented 8 years ago
In [50]: for r in missingetags: update_publisher_recordset.harvest_file(r, idbmodel)
10:50:22 INFO    idigbio             | Harvest File 4197 UAZ Mammals - Version 5
http://ipt.vertnet.org:8080/ipt/archive.do?r=uaz_mammals {}
10:50:23 DEBUG   idigbio             | Starting Upload of '552ce2e5-b627-4d6d-b914-6b495d0a79e6'
10:50:23 DEBUG   idigbio             | ETAG 6c7ee029fab9443d684216c28c19210e already present in Storage.
10:50:23 INFO    idigbio             | Harvest File 4130 Entomological Collections (NHRS), Swedish Museum of Natural History (NRM) - Version 26.44
http://www.gbif.se/ipt/archive.do?r=nhrs-nrm {}
10:50:26 DEBUG   idigbio             | Starting Upload of '59422682-15ba-47e1-99e2-1ef69f7bdd9a'
10:50:26 DEBUG   idigbio             | ETAG e3b9dad712029429763e40bf959f5555 already present in Storage.
10:50:26 INFO    idigbio             | Harvest File 2123 CAS Ornithology (ORN) - Version 107
http://ipt.calacademy.org:8080/ipt/archive.do?r=orn {}
10:50:29 DEBUG   idigbio             | Starting Upload of 'b8cbed64-5126-46bd-97aa-43627743aba7'
10:50:29 DEBUG   idigbio             | ETAG a5999bb6eb95ca3a6860b8ef33f7ed57 already present in Storage.
10:50:29 INFO    idigbio             | Harvest File 4127 Lund Museum of Zoology - Insect collections (MZLU) - Version 367.42
http://www.gbif.se/ipt/archive.do?r=mzlu-insects {}
10:50:33 DEBUG   idigbio             | Starting Upload of '01dfe0f4-24fe-447e-9f8f-1db7f8394b89'
10:50:33 DEBUG   idigbio             | ETAG 5280519dc58d0b3bfbd6b5d6b26d12c8 already present in Storage.
10:50:33 INFO    idigbio             | Harvest File 1864 University of Florida Herbarium (FLAS) - Version 11.3
http://ipt.flmnh.ufl.edu:8080/ipt/archive.do?r=herbarium {}
10:50:34 DEBUG   idigbio             | Starting Upload of '781fd581-7b93-471e-a025-413e4bcd8491'
10:50:34 DEBUG   idigbio             | ETAG 7c291247c4d7b20d18c68345ba822e55 already present in Storage.
10:50:34 INFO    idigbio             | Harvest File 2792 RBdna - Rio de Janeiro Botanical Garden DNA Collection - Version 7.29
http://ipt.jbrj.gov.br/jbrj/archive.do?r=jbrj_dna {}
10:50:35 DEBUG   idigbio             | Starting Upload of '1bc74afb-698f-43a7-90e6-352dba6c74da'
10:50:35 DEBUG   idigbio             | ETAG 5f9c8faced0d3d2a72d92bee4686f9bd already present in Storage.
10:50:35 INFO    idigbio             | Harvest File 2788 RB - Rio de Janeiro Botanical Garden Herbarium Collection - Version 84.21
http://ipt.jbrj.gov.br/jbrj/archive.do?r=jbrj_rb {}
10:51:06 DEBUG   idigbio             | Starting Upload of '953b0329-c3e4-4816-a038-7afbd2bb2547'
10:51:07 DEBUG   idigbio             | ETAG 25c80bdd3f3ece064248bb6332ce0cd1 already present in Storage.
10:51:07 INFO    idigbio             | Harvest File 2631 VIES - Herbário Central da Universidade Federal do Espírito Santo - Version 1.5
http://ipt1.cria.org.br/ipt/archive.do?r=vies {}
10:51:11 DEBUG   idigbio             | Starting Upload of '5e2f4c81-8c8a-45f3-a220-851f85f86b40'
10:51:11 DEBUG   idigbio             | ETAG fcd98d25683628a4830ffe422964873b already present in Storage.
10:51:11 INFO    idigbio             | Harvest File 2340 Iowa State University Digitized Collection
https://invertnet.org/idigbio-feed/datasets/isui.zip {}
10:51:11 DEBUG   idigbio             | Starting Upload of 'd3412433-4df9-4828-89e0-73956898f749'
10:51:11 DEBUG   idigbio             | ETAG 31c562adbdb4b4fb5393932fd8f3366d already present in Storage.
10:51:11 INFO    idigbio             | Harvest File 2347 University of Missouri Digitized Collection
https://invertnet.org/idigbio-feed/datasets/umoc.zip {}
10:51:11 DEBUG   idigbio             | Starting Upload of '81dc7cdb-66be-4683-ae79-068a784378b1'
10:51:11 DEBUG   idigbio             | ETAG 57b2289020c4153c56074da415395d94 already present in Storage.
10:51:11 INFO    idigbio             | Harvest File 2348 University of Wisconsin Digitized Collection
https://invertnet.org/idigbio-feed/datasets/wisc.zip {}
10:51:11 DEBUG   idigbio             | Starting Upload of 'b761d317-a36e-4a05-a5f4-bd3e3963daf6'
10:51:11 DEBUG   idigbio             | ETAG f795d208627bc70e946831f2e408ac0e already present in Storage.
10:51:11 INFO    idigbio             | Harvest File 2349 University of Wisconsin Oshkosh Digitized Collection
https://invertnet.org/idigbio-feed/datasets/uwo.zip {}
10:51:11 DEBUG   idigbio             | Starting Upload of '1d14acd1-20ef-4a55-8206-f04c8a75ea3e'
10:51:12 DEBUG   idigbio             | ETAG c45d9303f5ec9e925fca2929355733de already present in Storage.
10:51:12 INFO    idigbio             | Harvest File 4133 NRM-Fishes - Version 43.12
http://www.gbif.se/ipt/archive.do?r=nrm-fishes {}
10:51:13 DEBUG   idigbio             | Starting Upload of '021e2617-7532-4cef-806c-690bed32ab84'
10:51:13 DEBUG   idigbio             | ETAG a5b3f5b4a87222c7436ecaa51f64e67a already present in Storage.
10:51:14 INFO    idigbio             | Harvest File 2346 University of Minnesota Digitized Collection
https://invertnet.org/idigbio-feed/datasets/umsp.zip {}
10:51:14 DEBUG   idigbio             | Starting Upload of '833306f7-91b6-4ff7-bc16-0e406334d991'
10:51:14 DEBUG   idigbio             | ETAG 19d17ae7a08cc1101c8564d1e4f3fe91 already present in Storage.
10:51:14 INFO    idigbio             | Harvest File 4129 Lund Botanical Museum (LD) - Version 362.81
http://www.gbif.se/ipt/archive.do?r=ld-general {}
10:51:27 DEBUG   idigbio             | Starting Upload of 'f778ecc0-8371-49d5-9ab1-9d75f0b76fad'
10:51:28 DEBUG   idigbio             | ETAG aa8bb65d145e182501472e677d943d61 already present in Storage.
10:51:28 INFO    idigbio             | Harvest File 4126 Palaeozooloical Collections (PZ), Swedish Museum of Natural History (NRM) - Version 26.60
http://www.gbif.se/ipt/archive.do?r=pz-nrm {}
10:51:29 DEBUG   idigbio             | Starting Upload of '652ea450-af13-4334-96ff-3136d0188778'
10:51:29 DEBUG   idigbio             | ETAG 07b2d1b671e51b59c2c29e9456f33ead already present in Storage.
10:51:29 INFO    idigbio             | Harvest File 2341 Michigan State University Digitized Collection
https://invertnet.org/idigbio-feed/datasets/msuc.zip {}
10:51:29 DEBUG   idigbio             | Starting Upload of '196c4f1c-53f9-480f-a012-dc0522629047'
10:51:29 DEBUG   idigbio             | ETAG 0a404c49f806c59d2bbcde192d70403e already present in Storage.
10:51:29 INFO    idigbio             | Harvest File 2342 Ohio State University Digitized Collection
https://invertnet.org/idigbio-feed/datasets/osu.zip {}
10:51:29 DEBUG   idigbio             | Starting Upload of 'c1122f57-9ab9-4552-9393-7d56b0bbe852'
10:51:29 DEBUG   idigbio             | ETAG d059f157d57fc973585e6ad642cdd0d3 already present in Storage.
10:51:29 INFO    idigbio             | Harvest File 2125 CAS Botany (BOT) - Version 135
http://ipt.calacademy.org:8080/ipt/archive.do?r=botany {}
10:52:11 DEBUG   idigbio             | Starting Upload of '26f7cbde-fbcb-4500-80a9-a99daa0ead9d'
10:52:12 DEBUG   idigbio             | ETAG 6d2a274d83c35ea47fa4134084af692a already present in Storage.
10:52:12 INFO    idigbio             | Harvest File 2128 AntWeb - Version 8
http://ipt.calacademy.org:8080/ipt/archive.do?r=antweb {}
10:52:31 DEBUG   idigbio             | Starting Upload of 'ded380b5-1ba2-4089-8e0c-0aa1b4140785'
10:52:31 DEBUG   idigbio             | ETAG 44c563b25311c0ffd17a56ac81762fdc already present in Storage.
10:52:31 INFO    idigbio             | Harvest File 2127 CAS Mammalogy (MAM) - Version 111
http://ipt.calacademy.org:8080/ipt/archive.do?r=mam {}
10:52:33 DEBUG   idigbio             | Starting Upload of '14a8f79f-eab7-48da-ad50-bda142703820'
10:52:33 DEBUG   idigbio             | ETAG a7d238b36335867e45a3a642f726b732 already present in Storage.
10:52:33 INFO    idigbio             | Harvest File 2126 CAS Ichthyology (ICH) - Version 128
http://ipt.calacademy.org:8080/ipt/archive.do?r=ich {}
10:52:48 DEBUG   idigbio             | Starting Upload of 'beb74dc2-22ea-49e4-b1e3-bedb8e06e8f2'
10:52:48 DEBUG   idigbio             | ETAG 060efe10275e0dc587a87a5ee303f56d already present in Storage.
10:52:48 INFO    idigbio             | Harvest File 2343 Purdue University Digitized Collection
https://invertnet.org/idigbio-feed/datasets/purc.zip {}
10:52:49 DEBUG   idigbio             | Starting Upload of '47ac1531-5213-4848-a32d-5bb396ab9348'
10:52:49 DEBUG   idigbio             | ETAG f6b8345bf2f277f9eee12cbcaeacfca9 already present in Storage.
10:52:49 INFO    idigbio             | Harvest File 2344 University of Illinois at Urbana-Champaign Digitized Collection
https://invertnet.org/idigbio-feed/datasets/inhs.zip {}
10:52:49 DEBUG   idigbio             | Starting Upload of '57b1a2a3-78ab-4e69-a77e-a8fd4394ee5a'
10:52:49 DEBUG   idigbio             | ETAG 32c87d2c99d800f7e17725a83bcb5db0 already present in Storage.
10:52:49 INFO    idigbio             | Harvest File 2345 University of Kansas Digitized Collection
https://invertnet.org/idigbio-feed/datasets/ku.zip {}
10:52:49 DEBUG   idigbio             | Starting Upload of '2e65e24b-b7e2-40a4-a40c-09edafc1e3f4'
10:52:49 DEBUG   idigbio             | ETAG 2527445afbbaaaa37519b70d079e3ddf already present in Storage.
10:52:49 INFO    idigbio             | Harvest File 2790 RBcarpo - Rio de Janeiro Botanical Garden Dry Fruits Collection - Version 8.29
http://ipt.jbrj.gov.br/jbrj/archive.do?r=seed_collection {}
10:52:50 DEBUG   idigbio             | Starting Upload of '4b05f088-74a4-44a5-a161-8b1484efc240'
10:52:50 DEBUG   idigbio             | ETAG 3bf9e652ad0335a6513216ea7ec5582e already present in Storage.
10:52:50 INFO    idigbio             | Harvest File 2791 RBw - Rio de Janeiro Botanical Garden Wood Collection - Version 8.28
http://ipt.jbrj.gov.br/jbrj/archive.do?r=jbrj_w {}
10:52:51 DEBUG   idigbio             | Starting Upload of '2bfc480c-e5b3-4a9b-9587-a92c22830ace'
10:52:51 DEBUG   idigbio             | ETAG c67895fd3d16ed111bc9fb3c8f0ecebf already present in Storage.
10:52:51 INFO    idigbio             | Harvest File 2633 CVRD - Herbário da Reserva Natural Vale - Version 1.7
http://ipt1.cria.org.br/ipt/archive.do?r=cvrd {}
10:52:53 DEBUG   idigbio             | Starting Upload of 'fe51bced-93ce-45b2-b0c6-f7256719a07b'
10:52:53 DEBUG   idigbio             | ETAG c7325f6af484aa69d469bedf9655927d already present in Storage.
10:52:53 INFO    idigbio             | Harvest File 2768 Illinois Natural History Survey Fish Collection - Version 3.0
http://biocoll.inhs.illinois.edu/portal/collections/datasets/dwc/INHS-FISH_DwC-A.zip {}
10:52:54 DEBUG   idigbio             | Starting Upload of '45544aa4-8762-4bf0-bfc6-890d08dc6ead'
10:52:55 DEBUG   idigbio             | ETAG 1e34f3473e236c67a9b87e869263e68e already present in Storage.
10:52:55 INFO    idigbio             | Harvest File 3291 ILLS-HERP DwC-Archive
http://biocoll.inhs.illinois.edu/portal/collections/datasets/dwc/INHS-HERP_DwC-A.zip {}
10:52:55 DEBUG   idigbio             | Starting Upload of 'be34dbd9-5d54-4837-9f49-ff423eb18e8b'
10:52:55 DEBUG   idigbio             | ETAG 972f81ce296142962fc57aa89d9dac72 already present in Storage.
10:52:55 INFO    idigbio             | Harvest File 1821 Vertebrate Zoology Division - Ichthyology, Yale Peabody Museum - Version 228
http://ipt.peabody.yale.edu/ipt/archive.do?r=ipt_vz_ich {}
10:52:57 DEBUG   idigbio             | Starting Upload of '30ab9c2a-0b54-4c04-84ca-bc7abdd90b52'
10:52:57 DEBUG   idigbio             | ETAG 5df6ff5b575cbde8f5178ffa32f94f17 already present in Storage.
10:52:57 INFO    idigbio             | Harvest File 1822 Vertebrate Zoology Division - Mammalogy, Yale Peabody Museum - Version 230
http://ipt.peabody.yale.edu/ipt/archive.do?r=ipt_vz_mam {}
10:52:59 DEBUG   idigbio             | Starting Upload of '76015dea-c909-4e6d-a8e1-3bf35763571e'
10:52:59 DEBUG   idigbio             | ETAG 4c5fff68a5424658e80f5e7d9be0b158 already present in Storage.
10:52:59 INFO    idigbio             | Harvest File 1870 Tall Timbers Research Station and Land Conservancy - Version 11
https://herbarium.bio.fsu.edu:8443/archive.do?r=ttrs {}
10:52:59 DEBUG   idigbio             | Starting Upload of '06c35934-1b75-4196-838d-29d509951bf9'
10:52:59 DEBUG   idigbio             | ETAG eb00c12c19e05ae7456d03fa4b86c14c already present in Storage.
10:52:59 INFO    idigbio             | Harvest File 1818 Paleobotany Division, Yale Peabody Museum - Version 228
http://ipt.peabody.yale.edu/ipt/archive.do?r=ipt_pb {}
10:53:03 DEBUG   idigbio             | Starting Upload of '7ae4d15d-62e2-459b-842a-446f921b9d3f'
10:53:03 DEBUG   idigbio             | ETAG 7101cd2d239e44f04dc18ce5e6770e04 already present in Storage.
10:53:03 INFO    idigbio             | Harvest File 1820 Vertebrate Zoology Division - Herpetology, Yale Peabody Museum - Version 228
http://ipt.peabody.yale.edu/ipt/archive.do?r=ipt_vz_her {}
10:53:06 DEBUG   idigbio             | Starting Upload of 'cf60ed8a-2c79-4b85-a259-15a8e216dae4'
10:53:06 DEBUG   idigbio             | ETAG 81fb351c82fa6e716c03c58cc5e18d37 already present in Storage.
10:53:06 INFO    idigbio             | Harvest File 1814 Entomology Division, Yale Peabody Museum - Version 228
http://ipt.peabody.yale.edu/ipt/archive.do?r=ipt_ent {}
10:53:14 DEBUG   idigbio             | Starting Upload of '6b5e29d3-b462-44d8-ba38-d68af5088067'
10:53:15 DEBUG   idigbio             | ETAG bb75e298c865f76ec6384580fa4ff413 already present in Storage.
10:53:15 INFO    idigbio             | Harvest File 1815 Vertebrate Zoology Division - Ornithology, Yale Peabody Museum - Version 228
http://ipt.peabody.yale.edu/ipt/archive.do?r=ipt_vz_orn {}
10:53:19 DEBUG   idigbio             | Starting Upload of '8fc08919-1137-42e4-9fa5-9e64f1e5757b'
10:53:20 DEBUG   idigbio             | ETAG 110d59e10bad1c1e04f7ac5e51beba53 already present in Storage.
10:53:20 INFO    idigbio             | Harvest File 1816 Invertebrate Zoology Division, Yale Peabody Museum - Version 228
http://ipt.peabody.yale.edu/ipt/archive.do?r=ipt_iz {}
10:53:27 DEBUG   idigbio             | Starting Upload of 'b5f4526b-f4fb-4d90-8ce0-975e0cda8ff6'
10:53:27 DEBUG   idigbio             | ETAG 7feabdc327101a5a5d81d69187cecc99 already present in Storage.
10:53:27 INFO    idigbio             | Harvest File 1817 Botany Division, Yale Peabody Museum - Version 229
http://ipt.peabody.yale.edu/ipt/archive.do?r=ipt_bot {}
10:53:31 DEBUG   idigbio             | Starting Upload of '1527b668-b797-42be-94d3-0058e1393e94'
10:53:31 DEBUG   idigbio             | ETAG b570277f03032d043d1206fe2911de9a already present in Storage.
10:53:31 INFO    idigbio             | Harvest File 1819 Vertebrate Paleontology Division, Yale Peabody Museum - Version 228
http://ipt.peabody.yale.edu/ipt/archive.do?r=ipt_vp {}
10:53:34 DEBUG   idigbio             | Starting Upload of '0220907a-0463-4ae0-8a0b-77f5e80fff40'
10:53:34 DEBUG   idigbio             | ETAG 83833148b3257ee96afad19994c1cfe2 already present in Storage.
10:53:34 INFO    idigbio             | Harvest File 2159 Museum of Comparative Zoology, Harvard University - Version 147
http://digir.mcz.harvard.edu/ipt/archive.do?r=mczbase {}

10:54:53 DEBUG   idigbio             | Starting Upload of '271a9ce9-c6d3-4b63-a722-cb0adc48863f'
10:54:55 DEBUG   idigbio             | ETAG f329e092aa892815ba5666fb5d7f2e62 already present in Storage.
10:54:55 INFO    idigbio             | Harvest File 1957 Essig Museum of Entomology - Version 29
http://bnhmipt.berkeley.edu/ipt/archive.do?r=essig {}
10:54:57 DEBUG   idigbio             | Starting Upload of 'a68df423-aae9-4f4b-8a42-a36124627a53'
10:54:57 DEBUG   idigbio             | ETAG 71eaa4ff31ef4eab1325c45f208cabee already present in Storage.
10:54:57 INFO    idigbio             | Harvest File 1813 Invertebrate Paleontology Division, Yale Peabody Museum - Version 228
http://ipt.peabody.yale.edu/ipt/archive.do?r=ipt_ip {}
10:55:09 DEBUG   idigbio             | Starting Upload of '137ed4cd-5172-45a5-acdb-8e1de9a64e32'
10:55:09 DEBUG   idigbio             | ETAG 9134f3a545b4b97088f00e48ee1b6415 already present in Storage.
UnwashedMeme commented 8 years ago

Fix is in place, cron reactivated, data cleaned up.