pybliometrics-dev / pybliometrics

Python-based API-Wrapper to access Scopus
https://pybliometrics.readthedocs.io/en/stable/
Other
407 stars 127 forks source link

Enable all kwds in ScopusSearch() #264

Closed altuncu closed 1 month ago

altuncu commented 1 year ago

Hi,

I want to limit the number of maximum search results with the count parameter. But, it does not seem working. An example is provided below:

from pybliometrics.scopus import ScopusSearch

search_results = ScopusSearch('TITLE-ABS-KEY (blockchain AND gdpr)', count=1)
docs = search_results.get_eids()
print(docs)

The output:

['2-s2.0-85112649530', '2-s2.0-85106304015', '2-s2.0-85100802267', '2-s2.0-85127208166', '2-s2.0-85126367577', '2-s2.0-85126286802', '2-s2.0-85125185235', '2-s2.0-85125184058', '2-s2.0-85123374832', '2-s2.0-85115275849', '2-s2.0-85115240665', '2-s2.0-85113512526', '2-s2.0-85085893701', '2-s2.0-85124808547', '2-s2.0-85122391371', '2-s2.0-85120087978', '2-s2.0-85117707403', '2-s2.0-85119286468', '2-s2.0-85114475341', '2-s2.0-85103791115', '2-s2.0-85118123737', '2-s2.0-85114052648', '2-s2.0-85111912274', '2-s2.0-85114850985', '2-s2.0-85109962669', '2-s2.0-85107238203', '2-s2.0-85108912732', '2-s2.0-85108902298', '2-s2.0-85108152416', '2-s2.0-85115687551', '2-s2.0-85110409013', '2-s2.0-85101882580', '2-s2.0-85107820029', '2-s2.0-85107748090', '2-s2.0-85106658100', '2-s2.0-85113176044', '2-s2.0-85112869245', '2-s2.0-85100433895', '2-s2.0-85099496929', '2-s2.0-85107330416', '2-s2.0-85120980719', '2-s2.0-85104993820', '2-s2.0-85106183316', '2-s2.0-85098702075', '2-s2.0-85089969079', '2-s2.0-85104713583', '2-s2.0-85102518691', '2-s2.0-85102518444', '2-s2.0-85126146671', '2-s2.0-85125866068', '2-s2.0-85125669165', '2-s2.0-85125623891', '2-s2.0-85124799974', '2-s2.0-85124631097', '2-s2.0-85124338529', '2-s2.0-85123716370', '2-s2.0-85123296553', '2-s2.0-85123164862', '2-s2.0-85122520805', '2-s2.0-85121863529', '2-s2.0-85120530686', '2-s2.0-85118124429', '2-s2.0-85118105962', '2-s2.0-85115698960', '2-s2.0-85115058178', '2-s2.0-85114927715', '2-s2.0-85114739333', '2-s2.0-85112715979', '2-s2.0-85112230409', '2-s2.0-85111994968', '2-s2.0-85111363533', '2-s2.0-85111314237', '2-s2.0-85108353955', '2-s2.0-85106434122', '2-s2.0-85103780322', '2-s2.0-85103693629', '2-s2.0-85103687318', '2-s2.0-85102594851', '2-s2.0-85101098892', '2-s2.0-85098799893', '2-s2.0-85096499318', '2-s2.0-85096465514', '2-s2.0-85074482290', '2-s2.0-85096537200', '2-s2.0-85084000874', '2-s2.0-85113395751', '2-s2.0-85101828193', '2-s2.0-85098655797', '2-s2.0-85097526253', '2-s2.0-85102622685', '2-s2.0-85096733537', '2-s2.0-85098712633', '2-s2.0-85100648836', '2-s2.0-85096130423', '2-s2.0-85102011477', '2-s2.0-85099776377', '2-s2.0-85099238460', '2-s2.0-85099210955', '2-s2.0-85099186149', '2-s2.0-85087202613', '2-s2.0-85078172528', '2-s2.0-85102008242', '2-s2.0-85100163348', '2-s2.0-85099557418', '2-s2.0-85098878992', '2-s2.0-85103694353', '2-s2.0-85094957917', '2-s2.0-85097230684', '2-s2.0-85096232988', '2-s2.0-85089434428', '2-s2.0-85085097324', '2-s2.0-85097644903', '2-s2.0-85095869164', '2-s2.0-85095766933', '2-s2.0-85095732502', '2-s2.0-85093851048', '2-s2.0-85083178988', '2-s2.0-85086937488', '2-s2.0-85086886296', '2-s2.0-85110512731', '2-s2.0-85095565532', '2-s2.0-85089834901', '2-s2.0-85073998306', '2-s2.0-85125842197', '2-s2.0-85091476290', '2-s2.0-85088592395', '2-s2.0-85087917681', '2-s2.0-85083031008', '2-s2.0-85087000078', '2-s2.0-85085579435', '2-s2.0-85085556662', '2-s2.0-85111553614', '2-s2.0-85079487166', '2-s2.0-85081083867', '2-s2.0-85079702624', '2-s2.0-85109848971', '2-s2.0-85105546279', '2-s2.0-85101834286', '2-s2.0-85101829840', '2-s2.0-85100220618', '2-s2.0-85099255047', '2-s2.0-85097409901', '2-s2.0-85097331329', '2-s2.0-85097132210', '2-s2.0-85094132120', '2-s2.0-85094098253', '2-s2.0-85090511625', '2-s2.0-85089476036', '2-s2.0-85088752389', '2-s2.0-85088751558', '2-s2.0-85088536738', '2-s2.0-85088008956', '2-s2.0-85087762454', '2-s2.0-85085573649', '2-s2.0-85084859679', '2-s2.0-85083078176', '2-s2.0-85082170898', '2-s2.0-85082116467', '2-s2.0-85081591163', '2-s2.0-85081561132', '2-s2.0-85079575698', '2-s2.0-85079227043', '2-s2.0-85079226315', '2-s2.0-85077274774', '2-s2.0-85074150487', '2-s2.0-85068607648', '2-s2.0-85088270545', '2-s2.0-85078456235', '2-s2.0-85081341122', '2-s2.0-85079035179', '2-s2.0-85084612470', '2-s2.0-85079282941', '2-s2.0-85074376270', '2-s2.0-85074338765', '2-s2.0-85076220345', '2-s2.0-85075609855', '2-s2.0-85075285507', '2-s2.0-85071495369', '2-s2.0-85079342701', '2-s2.0-85075634153', '2-s2.0-85075174146', '2-s2.0-85074858104', '2-s2.0-85071033962', '2-s2.0-85068152886', '2-s2.0-85065535204', '2-s2.0-85074071580', '2-s2.0-85072939530', '2-s2.0-85069177811', '2-s2.0-85060551496', '2-s2.0-85061628827', '2-s2.0-85063531980', '2-s2.0-85063685419', '2-s2.0-85061915596', '2-s2.0-85062047716', '2-s2.0-85061805165', '2-s2.0-85128346139', '2-s2.0-85104550800', '2-s2.0-85090825331', '2-s2.0-85089234629', '2-s2.0-85081554455', '2-s2.0-85079165092', '2-s2.0-85077978437', '2-s2.0-85077894511', '2-s2.0-85077538435', '2-s2.0-85076227162', '2-s2.0-85076104686', '2-s2.0-85076099245', '2-s2.0-85076088790', '2-s2.0-85075911987', '2-s2.0-85075605255', '2-s2.0-85075591992', '2-s2.0-85074347402', '2-s2.0-85074120089', '2-s2.0-85074069927', '2-s2.0-85073841113', '2-s2.0-85073057460', '2-s2.0-85072972602', '2-s2.0-85072857828', '2-s2.0-85072104112', '2-s2.0-85070566813', '2-s2.0-85069508235', '2-s2.0-85069435975', '2-s2.0-85068445379', '2-s2.0-85068327394', '2-s2.0-85068223452', '2-s2.0-85067822957', '2-s2.0-85066787451', '2-s2.0-85066637109', '2-s2.0-85065885052', '2-s2.0-85065813585', '2-s2.0-85065297930', '2-s2.0-85064987875', '2-s2.0-85062942221', '2-s2.0-85058873720', '2-s2.0-85056860372', '2-s2.0-85093832574', '2-s2.0-85058318759', '2-s2.0-85077782183', '2-s2.0-85057961153', '2-s2.0-85057433542', '2-s2.0-85054056228', '2-s2.0-85049737364', '2-s2.0-85067890210', '2-s2.0-85067885155', '2-s2.0-85051362309', '2-s2.0-85047423702', '2-s2.0-85072855671', '2-s2.0-85069634332', '2-s2.0-85065329921', '2-s2.0-85062961905', '2-s2.0-85061842832', '2-s2.0-85059648175', '2-s2.0-85059604772', '2-s2.0-85058998005', '2-s2.0-85053712353', '2-s2.0-85050251219', '2-s2.0-85049038956', '2-s2.0-85048987713', '2-s2.0-85048977874', '2-s2.0-85048512134', '2-s2.0-85042408432', '2-s2.0-85030328772', '2-s2.0-85034431130']
Michael-E-Rose commented 1 year ago

That's intentional, because I assumed every user wants to the maximum possible count per result.

The Scopus docs (https://dev.elsevier.com/documentation/ScopusSearchAPI.wadl) state the count parameter is about the "maximum number of results", but that's misleading. It pertains only to the results page. Since the Scopus Search API uses pagination (i.e., results are returned via pages that link to each other so that the total results set is unlimited), the count variable here governs only the number of results per page, which just boosts up the number of pages.

If pybliometrics was to set count to 1, as in your case, you would not get 1 page with 200 results but 200 pages with 1 result. This would consume 200 calls of your key.

altuncu commented 1 year ago

Thanks for the answer. Does this mean that there is no way to limit the total number of results in the API, regardless of the pagination? In my case, I need to obtain the most relevant N articles given a search query. I can consider only the first N records from the search result, but it's not efficient for the queries returning a high number of records because of the downloading process. So, is it possible to limit the number of records returned in the first place to avoid unnecessary downloads?

Michael-E-Rose commented 1 year ago

Currently this is not possible with ScopusSearch() but I will think about how it can be realized. For your use case you would need three kwds (as per https://dev.elsevier.com/documentation/ScopusSearchAPI.wadl): "sort", "start" and "count".

None of this is difficult but requires careful testing. Sorry for the late answer, I forgot about this issue.

Michael-E-Rose commented 2 months ago

The first step will be to replace the count parameter in the search super class (https://github.com/pybliometrics-dev/pybliometrics/blob/f5a6f7fb3788f52422d253c87a345a11ce85ced5/pybliometrics/scopus/superclasses/search.py#L14). That's a parameter for the pagination, but when users provide count in an effort to limit the number of results, our parameter assumes priority.

The other parameters should work already. Please test with "sort", you may also use the other search classes.

Michael-E-Rose commented 1 month ago

Parameter "count" can now be used as per the Scopus documentation. However (!), it doesn't work as expected for the Scopus Search API. The number of results equals the count parameter times the number of result pages. Unlike the other search API, Scopus Search API uses pagination to cycle through the results set. When you provide "count", the API returns this number for each page rather than in total. Very weird, but I think we have to consider the possibility that Scopus lets their API degrade.

I can also confirm the other keyword parameters work as expected.