Closed kpj closed 3 years ago
Thanks for the bug report @kpj! @bscrow would you be able to take a look at this?
I wasn't able to reproduce the bug; here's an example: https://colab.research.google.com/drive/15cRhVw7Cy86N4JWWWtHfyJgr0mtwImdO?usp=sharing
After discussing with @saketkc, one possible reason for the bug would be that when the query was being processed, some of the entries were in the process of being submitted to SRA, such that while the pmid is returned in the search result, some of the corresponding metadata is not yet ready and could not be retrieved.
Either way, I think that I could have handled the error in a better way. Will make a pr soon
Thanks a lot for taking a look!
You make an interesting point, I just tried it again and it worked fine. It might indeed be the case that some samples were added during the query.
Maybe it would make sense to retrieve the UIDs after iterating through all accessions instead of doing it before (and making sure they are in the right order, etc).
UIDs are not found in the full XML file from SRA, so I don't see an efficient way to check that the UIDs are in the right order. I feel that maybe a better idea would be to store the UID list separately (#92 )
Describe the bug Using
SraSearch
withverbosity>=2
and a large query raises a ValueError when settingself.df["pmid"] = list(uids)
(https://github.com/saketkc/pysradb/blob/c23d4a769543d05a0f002d1b28c985da5963573f/pysradb/search.py#L776) because the size of the underlying dataframe seems to vary.The following error is raised:
Multiple runs yield slightly different error messages:
It seems like the index length is varying for some reason.
To Reproduce Execute the following code:
Desktop:
Linux
3.8.5
0.11.2-dev0