Closed bscrow closed 3 years ago
Merging #57 (ec8cb49) into master (26395b6) will increase coverage by
12.91%
. The diff coverage is70.22%
.
@@ Coverage Diff @@
## master #57 +/- ##
===========================================
+ Coverage 41.17% 54.08% +12.91%
===========================================
Files 5 7 +2
Lines 1059 1773 +714
===========================================
+ Hits 436 959 +523
- Misses 623 814 +191
Impacted Files | Coverage Δ | |
---|---|---|
pysradb/cli.py | 0.00% <0.00%> (ø) |
|
pysradb/download.py | 22.22% <20.68%> (-2.78%) |
:arrow_down: |
pysradb/search.py | 79.81% <79.81%> (ø) |
|
pysradb/exceptions.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 26395b6...ec8cb49. Read the comment docs.
This is super awesome @bscrow! Many thanks for your contribution and for the awesome work you have done over GSoC2020! I believe this will be a huge help to a lot of researchers!
I have left some comments, most of them are minor. It would be great if they can be addressed. Of all, it is particularly important we output all the URLs rather than selecting the best one ourselves.
Great work!
cc @mvdbeek @amalthomas111
When using -g it might be a good idea to have a dynamic naming prefix/suffix for the plots. Could use time stamps. Otherwise, plots would be overwritten.
For -G, -Y, -Z options it would be great if you could create a file in GitHub or locally which users can refer to, compiling possible options for each of these tags. In the help (-h) options, you can refer to the link of this file or local path.
pysradb search -q "single-cell RNA-seq" -g -D 01-01-2008:01-10-2020
This command does not work. Gives the error:
ValueError: bins
must be positive, when an integer
pysradb search -d geo -q "single-cell RNA-seq" -m 10K -o test_1ksc
First status bar was showing me 0/100000 [00:00<?, ?it/s]. I mentioned 10K, not 100K. When I mentioned 100, it is showing 1K, a factor of 10 is more. I think this happens with -d geo, not with sra. For both -m = 100 and 10K, I got a connection error: http.client.RemoteDisconnected: Remote end closed connection without response. During handling of the above exception, another exception occurred: Is it NCBI issue?
I am getting connection/operation time out for almost all m > 100 for db=geo/sra. Need to look into this!
pysradb search -q "single-cell RNA-seq" -g -D 01-01-2008:01-10-2020
This command does not work. Gives the error: ValueError:
bins
must be positive, when an integer
I've resolved this bug as well as a related bug when the query returns entries without base count - Since none of the entries contains information about base count, the number of bins for a base count histogram will cause the error.
When using -g it might be a good idea to have a dynamic naming prefix/suffix for the plots. Could use time stamps. Otherwise, plots would be overwritten.
Thanks for the suggestion! I've implemented it in the new commit
pysradb search -d geo -q "single-cell RNA-seq" -m 10K -o test_1ksc
First status bar was showing me 0/100000 [00:00<?, ?it/s]. I mentioned 10K, not 100K. When I mentioned 100, it is showing 1K, a factor of 10 is more. I think this happens with -d geo, not with sra. For both -m = 100 and 10K, I got a connection error: http.client.RemoteDisconnected: Remote end closed connection without response. During handling of the above exception, another exception occurred: Is it NCBI issue?
I am getting connection/operation time out for almost all m > 100 for db=geo/sra. Need to look into this!
I've debugged the issue of retrieving 10X entries from SRA.
As for the connection error, I couldn't replicate the error on my side except by running another pysradb search operation from the same IP address while the above process is running, in which case NCBI terminated my connection for exceeding their API limit. Can I check if this is the case when you tested pysradb search?
For -G, -Y, -Z options it would be great if you could create a file in GitHub or locally which users can refer to, compiling possible options for each of these tags. In the help (-h) options, you can refer to the link of this file or local path.
I've added a short guide for these tags as well as for queries GEO DataSets. This can be accessed via command line using pysradb search --geo-info
or by calling GeoSearch.info()
on python.
21e1c1f
@bscrow a failing example: https://colab.research.google.com/drive/1hN6m7kJ4Xpflvde3wK12Ubzu_Aq3x3qX?usp=sharing
@bscrow a failing example: https://colab.research.google.com/drive/1hN6m7kJ4Xpflvde3wK12Ubzu_Aq3x3qX?usp=sharing
I've added a check for no search results in 797da82, which should resolve the error message. In order to generate statistics however, instance.search() must be called first to retrieve search results.
https://colab.research.google.com/drive/1pCmfj-uUDpnBFCXCZoiBw-k82Pi12Otu?usp=sharing
My updated documentation for pysradb search is in the pull request #51 a live version can be currently found on https://bscrow.github.io/pysradb/commands/search.html
This PR contains the same set of changes in #43, minus the writeup files