lukasschwab / arxiv.py

Python wrapper for the arXiv API
MIT License
1.07k stars 120 forks source link

Author comments on `Results` are incorrectly `None` #78

Closed hilllinus closed 3 years ago

hilllinus commented 3 years ago

hello, I used your code. But the result.comment and result.doi is None. There's bug on Line 131 and 133 of arxiv.py. I think the right code is

comment=entry.arxiv_comment,

doi=entry.arxiv_doi,

Please check. Thanks.

lukasschwab commented 3 years ago

Is there an example paper for which result.doi and result.comment are unexpectedly None?

This library does currently extract the DOI from entry.arxiv_doi: https://github.com/lukasschwab/arxiv.py/blob/0ba1a3e146085d5e74fe840723af2e7c952d6a39/arxiv/arxiv.py#L133

But uses entry.comment to extract result.comment: https://github.com/lukasschwab/arxiv.py/blob/0ba1a3e146085d5e74fe840723af2e7c952d6a39/arxiv/arxiv.py#L131

lukasschwab commented 3 years ago

Doing some scattershot testing, it seems that DOIs are being extracted correctly (though not every arXiv entry has a corresponding DOI) but comments are plausibly being dropped:

>>> import arxiv
>>> results = list(arxiv.Search("testing", max_results=1000).get())
>>> [r.comment for r in results]
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
>>> [r.doi for r in results]
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '10.1088/1751-8113/41/39/395309', None, None, None, None, None, '10.1016/j.infsof.2019.06.006', None, None, None, None, None, None, '10.1145/3387940.3391535', None, None, '10.4204/EPTCS.80.8', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '10.1145/3387940.3392253', None, None, None, None, None, '10.4204/EPTCS.80.1', None, None, None, '10.1145/2896941.2896944', None, None, '10.5121/vlsic.2012.3205', None, None, None, '10.1016/j.jss.2020.110574', None, None, None, None, '10.3150/14-BEJ694', None, None, None, '10.1109/ICSTW50294.2020.00051', None, None, None, None, '10.1088/0266-5611/24/2/025028', None, None, None, None, '10.1109/ICST46399.2020.00019', None, None, None, None, None, None, '10.1145/3379597.3387471', None, None, None, None, None, None, None, None, None, '10.1109/ISIT.2012.6283596', '10.5121/ijsea.2012.3404', '10.1088/1367-2630/18/4/045013', None, None, None, '10.1109/TIFS.2017.2656473', None, None, None, None, None, '10.1109/AITEST49225.2020.00012', None, None, None, None, None, None, None, None, None, None, None, '10.4230/OASIcs.SLATE.2012.185', None, None, None, None, '10.1504/IJMOR.2018.10011879', None, None, None, None, None, None, None, None, None, None, None, '10.1016/j.jeconom.2014.09.006', None, None, None, None, None, None, None, None, None, None, None, None, '10.1145/3319008.3319021', None, None, None, None, None, None, None, None, None, '10.1109/ASE.2011.6100094', None, None, None, None, None, None, '10.1515/jci-2018-0004', None, None, None, None, '10.1109/CompEng.2018.8536223', None, None, None, None, None, '10.1017/S096354832100002X', None, None, '10.3390/e22060630', None, None, None, None, None, None, None, None, None, None, None, '10.1109/ICST.2017.52', '10.1145/3092703.3092709', '10.1002/asi.24203', None, None, None, None, None, None, '10.1140/epjc/s2005-02278-9', '10.1088/1367-2630/11/4/043028', None, None, None, '10.1112/S1461157015000169', None, None, None, None, None, None, None, None, None, '10.1016/j.spl.2012.07.022', None, None, None, None, None, None, None, None, '10.1214/08-SS036', '10.5121/vlsic.2010.1302', None, None, '10.4204/EPTCS.61.4', None, None, None, '10.7321/jscse.v3.n3.49', None, '10.1145/2635868.2635906', None, '10.1214/13-AOS1168', '10.4204/EPTCS.180.4', None, None, None, None, None, '10.1214/19-ba1194', None, '10.1145/3183440.3183492', None, None, None, '10.1007/s11219-019-09446-5', '10.1145/3180155.3180182', None, None, None, '10.1111/rssb.12318', None, None, None, None, None, None, None, None, '10.1098/rsif.2019.0234', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '10.1109/IECON43393.2020.9254910.', None, None, None, '10.1103/PhysRevC.58.1175', None, None, None, None, '10.1080/03610920903402613', '10.1016/j.ijmedinf.2008.12.004', '10.5121/ijsea.2012.3602', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '10.1016/j.csda.2018.01.022', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '10.1214/074921706000000662', None, None, None, '10.1016/j.csda.2006.06.010', '10.1109/TSP.2013.2271484', '10.1214/15-STS513', None, '10.1007/s10664-019-09692-y', None, None, '10.1145/3338906.3338948', '10.1145/3338906.3340448', '10.1145/3340433.3342824', None, None, '10.1016/j.infsof.2020.106319', None, None, None, None, None, None, None, None, None, None, '10.1007/s11229-021-03276-4', None, None, '10.1016/S1476-9271(02)00090-7', None, None, None, None, None, '10.1214/13-AOS1123', None, None, '10.1007/s11222-016-9656-z', None, None, '10.1007/s00220-013-1678-1', '10.1016/j.jmva.2013.03.015', None, None, None, None, None, None, None, None, '10.1093/biomet/asw066', None, '10.1017/S026646661500033X', None, None, '10.1103/PhysRevA.92.062111', None, None, None, '10.1145/1062455.1062529', None, None, '10.1016/j.matcom.2018.08.005', None, None, None, None, None, None, None, '10.1080/10485252.2019.1705298', None, '10.17656/jzs.10586', None, '10.1093/biomet/asaa079', None, None, None, None, None, None, None, None, None, '10.14445/22315381/IJETT-V68I7P202S', '10.1007/978-3-030-59762-7_2', None, None, None, None, None, None, None, None, '10.1016/j.infsof.2021.106567', None, None, None, None, None, None, None, None, None, None, None, None, None, None, '10.1016/j.csda.2018.01.001', None, None, '10.1016/j.csda.2013.12.003', None, None, None, None, None, '10.1007/s10664-018-9653-2', '10.1007/978-3-319-66299-2_1', None, None, None, None, None, None, None, None, None, None, None, '10.22152/programming-journal.org/2020/4/12', None, None, '10.1137/050631847', None, None, None, None, None, None, None, None, None, None, '10.1007/s10260-013-0252-5', None, None, None, None, '10.1016/j.jspi.2012.07.002', '10.1080/00949655.2012.704517', None, None, '10.1007/s00453-013-9791-2', None, None, None, None, None, None, None, None, '10.1103/PhysRevD.90.064035', None, '10.1103/PhysRevA.91.032111', None, None, None, None, None, None, None, None, None, None, None, '10.1007/s11425-016-0131-0', None, None, None, None, None, None, '10.4204/EPTCS.245.4', '10.1088/1674-1137/38/2/027001', None, None, None, None, '10.1016/j.chaos.2018.08.031', None, None, None, '10.4204/EPTCS.277.9', None, None, None, '10.1109/ISIT.2019.8849712', None, None, '10.1006/jnth.1998.2247', '10.1016/j.cam.2020.112968', None, None, '10.1007/978-3-030-21485-2_5', None, None, None, None, None, None, None, None, None, None, None, None, None, None, '10.1046/j.1365-8711.2003.06874.x', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '10.1287/mksc.2019.1194', '10.1111/rssb.12392', '10.1109/TSE.2019.2946773', None, '10.1109/ETS.2019.8791526', None, None, None, None, None, None, '10.1145/3408877.3432417', None, None, None, None, '10.1109/WETSoM.2015.12', '10.1109/APSEC.2016.34', None, None, '10.1142/9789812701688_0010', '10.1103/PhysRevE.58.5153', None, None, '10.1103/PhysRevLett.94.121102', '10.1063/1.2405049', None, '10.1214/009053604000000896', None, None, None, None, None, None, '10.1214/193940307000000103', None, None, None, None, None, '10.3150/09-BEJ208', None, None, '10.4204/EPTCS.80.3', '10.4204/EPTCS.80.5', None, None, None, None, None, '10.1111/rssb.12172', None, '10.7321/jscse.v2.n8.1', '10.1007/978-3-662-46669-8_33', None, None, None, None, None, None, '10.1007/s11222-015-9594-1', None, None, '10.5121/ijics.2013.3301', None, None, None, None, '10.1111/j.1467-9892.2011.00752.x', '10.1371/journal.pone.0199102', None, '10.1109/TIT.2018.2861772', None, None, None, None, None, None, None, None, None, None, None, None, '10.1109/ISED.2010.52', None, '10.1016/j.jspi.2013.09.014', None, None, None, None, '10.1103/PhysRevA.92.062134', None, None, '10.1145/3022099.3022101', None, None, None, None, None, None, None, None, None, None, '10.1109/KBEI.2015.7436097', None, None, None, None, '10.1007/s10994-019-05857-4', None, None, None, '10.4204/EPTCS.277.11', None, None, None, None, None, None, None, None, '10.1109/TIT.2020.3023377', '10.1007/978-3-319-66158-2_25', None, '10.4230/LIPIcs.APPROX-RANDOM.2019.40', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '10.1145/3377812.3382142', None, None, None, '10.1016/j.jss.2020.110639', None, None, '10.1109/DDECS.2014.6868759', None, None, None, None, None, None, '10.1103/PhysRevE.103.022110', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '10.1109/TSE.2019.2921965', None, None, None, None, None, None, '10.1109/TIT.2011.2104670', None, None, '10.1214/08-EJS172', '10.1214/08-AOS643', '10.2168/LMCS-8(4:8)2012', None, None, None, None, None, None, '10.1080/01621459.2019.1699421', None, None, '10.1109/TR.2018.2799957', None, None, None, None, '10.1016/j.jss.2019.110398', '10.1088/1367-2630/aad89b', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '10.1145/3377811.3380429', '10.5220/0009766800270038', None, None, None, None, None, '10.1109/AITEST49225.2020.00016', None, None, None, None, None, None, '10.1145/3453483.3454054', None, None, None, None, None, None, None, None, '10.1109/ASE.2001.989822', None, None, None, None, '10.1214/009053606000000119', '10.1214/009053606000000380', None, '10.1103/PhysRevA.65.012319', None, None, None, None, None, None, None, None, None, None, '10.4204/EPTCS.80.4', None, '10.4236/ojapps.2012.22015', None, None, None, None, '10.4204/EPTCS.102.9', None, None, None, '10.3390/e16031376', '10.1214/13-AOS1099', None, None, '10.1007/978-3-642-27213-4_12', None, None, '10.7321/jscse.v2.n7.6', None, None, '10.1111/biom.12695', None, None, None, None, '10.4204/EPTCS.57.5', None, None, '10.1007/978-3-319-14358-3_14', '10.1214/14-AOS1298', None, '10.4204/EPTCS.180', '10.4204/EPTCS.180.2', '10.1111/rssb.12234', '10.1063/1.4960172', None, None, None, None, None, None, None, None, '10.1145/2764979.2764987', None, None, '10.1515/ijb-2017-0023', '10.1080/19466315.2018.1458648', None, None, None, '10.1177/0962280218796685', None, '10.3938/jkps', None, None, '10.1613/jair.1166', None, '10.4204/EPTCS.98.3', None, '10.1214/17-BA1059', None, None, '10.4204/EPTCS.141.4', None, None, None, '10.1109/ISIT.2016.7541486', None, '10.1080/01621459.2017.1307757', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '10.6028/jres.125.003', None, '10.1007/978-3-030-38965-9_7', None, None, '10.1016/j.jcta.2020.105217', '10.1111/ahg.12229', '10.1103/PhysRevE.96.053303', None, None, None, '10.1214/18-AOS1758', '10.1007/s11590-016-1037-1', None, '10.1145/3278186.3278190', None, '10.1145/3337932.3338814', None, '10.1145/3377811.3380377', None, None, None, None, None, None, None, None, None, '10.1017/S0266466621000062', '10.1016/j.csda.2019.106895', None, None, None, None, None, None, '10.1109/SEAA.2017.36', None, '10.5121/mlaij.2019.6102', None, None, None]
hilllinus commented 3 years ago

import arxiv

search = arxiv.Search(id_list=["2107.14211v1"]) paper = next(search.results()) print(paper.doi) print(paper.comment)

The result will be None, None.

I change code to comment=entry.arxiv_comment, doi=entry.arxiv_doi,

the result is right.

lukasschwab commented 3 years ago

The DOI is extracted as expected by the latest version of this package:

>>> import arxiv
>>> search = arxiv.Search(id_list=["2107.14211v1"])
>>> paper = next(search.results())
>>> paper.doi
'10.22323/1.395.1113'

Which makes sense: the existing DOI extraction code is equivalent to your suggestion (doi=entry.arxiv_doi).

You correctly identify a bug in comment extraction; #79 will fix that issue. I should have a release published today.

lukasschwab commented 3 years ago

Comment extraction fixed in release 1.4.1: https://pypi.org/project/arxiv/