Closed bscrow closed 4 years ago
Merging #43 into master will increase coverage by
12.81%
. The diff coverage is71.84%
.
@@ Coverage Diff @@
## master #43 +/- ##
===========================================
+ Coverage 41.93% 54.74% +12.81%
===========================================
Files 5 7 +2
Lines 1023 1706 +683
===========================================
+ Hits 429 934 +505
- Misses 594 772 +178
Impacted Files | Coverage Δ | |
---|---|---|
pysradb/cli.py | 0.00% <0.00%> (ø) |
|
pysradb/download.py | 20.25% <12.50%> (-4.75%) |
:arrow_down: |
pysradb/search.py | 81.29% <81.29%> (ø) |
|
pysradb/exceptions.py | 100.00% <100.00%> (ø) |
|
pysradb/sraweb.py | 83.25% <0.00%> (-1.33%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update da62643...25f4c13. Read the comment docs.
Thanks @bscrow for the latest updates! It seems to be progressing pretty good so far!
Just couple of things:
study_accession
as the first column since it is the most useful infosra_geo
mode:pysradb search -q "ferret" --max 1000 --db sra_geo
however we sholud expect ~ 650 results. It is possible you are still working on it, so let me know when it is ready for review.
Great work so far!
Also, once it is ready it would be worthwhile to check all the tests pass. It seems we have some failing tests in the test_search.py
.
Thanks @bscrow for the latest updates! It seems to be progressing pretty good so far!
Just couple of things:
- Any search query (verbose or otherwise) should always output
study_accession
as the first column since it is the most useful info- I tried the
sra_geo
mode:pysradb search -q "ferret" --max 1000 --db sra_geo
however we sholud expect ~ 650 results. It is possible you are still working on it, so let me know when it is ready for review.
Great work so far!
I have just debugged GeoSearch, so it should work as intended now. I'll update the detailed documentation after finishing with the tests. But essentially:
pysradb search -q "ferret" --max 1000 --db sra_geo
sends this query to SRA: 'ferret AND sra gds[Filter]'.
sra gds[Filter] ensures that the entries in the results can be found in GEO DataSets as well.
To query GEO DataSets instead, you can instead do
pysradb search --geo-query ferret --max 1000 --db sra_geo
This will send the query ferret AND gds sra[Filter] to GEO DataSets. The GDS uids from the response are then converted to "related" SRA uids via ELink. This produces the same results as the "Find related data" feature on the website (shown below)
Website result:
pysradb search result:
However, I feel that my current implementation of GeoSearch may not be optimal.
I have noticed that ELink doesn't seem to retrieve the exact corresponding entries in SRA. For eg, GSE142617/SRP238838 or the 6 Experiments that it encompasses in the above search on Geo DataSets doesn't show up among the entries after the ELink conversion.
On the other hand, it is possible to find SRA entries corresponding to Geo DataSets search results by downloading the summary of both search results and then try to match accession numbers in the summaries, but I can't think of a very efficient way of doing this for queries such as "e coil" which yields many search results on both APIs
--
--
--
Hi @bscrow, can you rebase with master (resoving the conflicts)?
Any updates on based on our previous discussion?
I forgot to mention earlier, but we also want to support https://github.com/saketkc/pysradb/issues/38 Do you have a notebook for this?
Hi @bscrow, would you be able to create a new PR (from a new branch) that is similar to this PR but without any writeup sections? I have reviewed it and it looks good so far, I will fix the small changes at my end.
Planning to merge it in the coming week. Thanks!
No problems! I've created the new PR: #57
Awesome, thanks a lot @bscrow! Closing in favor of #57
Implemented the search feature for phase 1 of GSoC