Open Ariel225 opened 3 years ago
Thanks for your very clear bug report! I'll look into the details. I see that arxiv_search(query="ti:Fourfolds", limit=100)
works but arxiv_search(query="ti:Fourfolds", limit=101)
gives the error.
I'll follow both of your suggestions: trap such errors better and also report the problem to arxiv, if there's a problem either with the record or with their API.
Okay, I get it. For this search, you get proper results if limit <= 77, but if limit >= 78, it returns NULL. If batchsize < limit and you're in this latter case, you get the error about assigning attributes to NULL.
> dim(result <- arxiv_search(query="ti:Fourfolds", limit=77))
[1] 77 15
> dim(result <- arxiv_search(query="ti:Fourfolds", limit=78))
[1] 0 15
!> dim(result <- arxiv_search(query="ti:Fourfolds", limit=78, batchsize=50))
retrieved batch 1
Error in attr(results, "search_info") <- search_attributes(query, id_list, :
attempt to set an attribute on NULL
Certain records seem to cause a crash. We have narrowed it down to this query, which should retrieve all records submitted in a one-minute period of 22:16 to 22:17 on January 24, 2018.
dfy<-arxiv_search(query = "submittedDate:[201801242216 TO 201801242217]", limit = 15000, batchsize=2000)
which returns an error of:
We can isolate the record, which appears to be this one: https://arxiv.org/abs/1610.04266
If we were to search using title, the same error appears:
dfy<-arxiv_search(query = "ti:Fourfolds", limit = 1200, batchsize=300)
We therefore think that either the record is corrupt (e.g., hidden unintentional column delimiter, etc.)A similar error occurs on this single-date range, though we have not isolated the individual record causing the error:
dfy<-arxiv_search(query = "submittedDate:[201612030000 TO 201612040000]", limit = 15000, batchsize=2000)
Does the query need to be modified? Can the query auto-skip corrupt records? Should arxiv be notified?