mloesch / sickle

Sickle: OAI-PMH for Humans
Other
106 stars 42 forks source link

Iteration with next() is very slow #54

Open Iris-Hinrichs opened 3 years ago

Iris-Hinrichs commented 3 years ago

Iteration with next() gets very slow when the OAIItemIterator is "empty" but StopIteration has not been raised yet. It takes several minutes.

Example: from sickle import Sickle oai_end = 'http://ws.pangaea.de/oai/provider' sickle = Sickle(oai_end) records= sickle.ListRecords(**{'metadataPrefix':'oai_dc', 'set': 'query~cHJvamVjdDpsYWJlbDpEQU0gQU5EIGV2ZW50Om1ldGhvZDpGZXJyeUJveA', 'ignore_deleted':'True'}) entry = records(next) # records contains only one entry for the time being. This may change in future records(next)

Iris-Hinrichs commented 3 years ago

part of the problem may be that more than 300.000 deleted records published by the above mentioned data provider are not assigned to any set. The query, however, finds these deleted records and I assume that they internally slow down the iteration process.