Closed fxcoudert closed 6 years ago
I should also state that, even before breaking, the server seems to return 3 records whose datestamps do not match the requested from_
parameters:
2018-05-17 13:10:54 Quantitative Characterization of Molecular-Stream Separation
2018-01-10 15:47:36 Melting of zeolitic imidazolate frameworks with different topologies: insight from first-principles molecular dynamics
2017-09-07 20:44:45 Facile Fabrication of Ultralow-Density Transparent Boehmite Nanofiber Cryogel Monoliths and Their Application in Volumetric Three-Dimensional Displays
Probably not related, and not as annoying as a crash, but still…
When I run your code, I get 67 results, so I can't reproduce it. The NoRecordsMatch error gets raised when the server returns no results, this is part of the OAIPMH protocol.
I also got the 3 records with the wrong timestamp. The server should not have returned those. These seem to be problems with the figshare api and not with this library.
I understand that NoRecordsMatch
should be returned when the server returns no results. The bug here is that, sometimes, the pyoai library raises this error while the server did return results.
In fact, from my testing it appears the NoRecordsMatch
occurs when (and only when) the number of records returned is an exact multiple of ten. I thus suspect this is a pagination bug.
Using from_
and until
to craft a time range for which there is exactly 10 results shows the bug:
bli /tmp $ cat a.py
#!/usr/bin/env python3
from oaipmh.client import Client
from oaipmh.metadata import MetadataRegistry, oai_dc_reader
import datetime
registry = MetadataRegistry()
registry.registerReader('oai_dc', oai_dc_reader)
client = Client('https://api.figshare.com/v2/oai', registry)
f = datetime.datetime.strptime('2018-07-24 14:56:00', '%Y-%m-%d %H:%M:%S')
u = datetime.datetime.strptime('2018-07-27 15:00:00', '%Y-%m-%d %H:%M:%S')
for record in client.listRecords(metadataPrefix='oai_dc', set='portal_259', from_=f, until=u):
print(record[0].datestamp(), end=' ')
print(record[1]['title'][0])
which gives:
bli /tmp $ ./a.py
2018-07-27 14:00:34 Boehmite Nanofiber-Reinforced Resorcinol-Formaldehyde Macroporous Monoliths for Heat/Flame Protection
2018-07-26 21:31:00 Theory of the reactant-stationary kinetics for zymogen activation coupled to an enzyme catalyzed reaction
2018-07-26 16:57:19 Facile Synthesis of a Diverse Library of Mono-3-substituted β-Cyclodextrin Analogues
2018-07-26 14:00:22 Computationally-Inspired Discovery of an Unsymmetrical Porous Organic Cage
2018-07-26 13:57:17 Unzipping Natural Products: Improved Natural Product Structure Predictions by Ensemble Modeling and Fingerprint Matching
2018-07-25 18:45:57 Air Quality in Puerto Rico in the Aftermath of Hurricane Maria: A Case Study on the Use of Lower-Cost Air Quality Monitors
2018-07-25 15:08:33 Magnetic Structure of UO2 and NpO2 by First-Principle Methods
2018-07-25 15:06:07 Tailing miniSOG: Structural Bases of the Complex Photophysics of a Flavin-Binding Singlet Oxygen Photosensitizing Protein
2018-07-25 14:31:52 On-Surface Radical Oligomerisation: A New Approach to STM Tip-Induced Reactions
2018-07-24 14:56:00 Hue Parameter Fluorescence Identification of Edible Oils with a Smartphone
Traceback (most recent call last):
File "./a.py", line 14, in <module>
for record in client.listRecords(metadataPrefix='oai_dc', set='portal_259', from_=f, until=u):
File "/Users/fx/anaconda3/lib/python3.6/site-packages/oaipmh/client.py", line 365, in ResumptionListGenerator
result, token = nextBatch(token)
File "/Users/fx/anaconda3/lib/python3.6/site-packages/oaipmh/client.py", line 194, in nextBatch
resumptionToken=token)
File "/Users/fx/anaconda3/lib/python3.6/site-packages/oaipmh/client.py", line 308, in makeRequestErrorHandling
raise getattr(error, code[0].upper() + code[1:] + 'Error')(msg)
oaipmh.error.NoRecordsMatchError: The result in an empty list.
With this from_/until specification, it should be reproducible for you. I hope you can reopen the bug.
This very simple code is requesting records from a figshare set:
After finding several records, the code throws an exception with the following error:
If I remove the
from_
parameter from thelistRecords
call, it all works fine.