saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
311 stars 51 forks source link

Python Api #25

Closed bhavaygg closed 4 years ago

bhavaygg commented 4 years ago

What I Did

from pysradb import SRAdb
db = SRAdb('SRAmetadb.sqlite')
df = db.sra_metadata('SRP098789')
df.head()

The link to Python API is broken. Could you please fix it or list how I can use this library from inside my code. I tried the above code and got this output.

Traceback (most recent call last):
  File "c:\Users\Bhavay\Desktop\nvbi.py", line 47, in <module>
    db = SRAdb('SRAmetadb.sqlite')
  File "C:\Python\lib\site-packages\pysradb\sradb.py", line 178, in __init__
    _verify_srametadb(sqlite_file)
  File "C:\Python\lib\site-packages\pysradb\sradb.py", line 155, in _verify_srametadb
    metadata = db.query("SELECT * FROM metaInfo")
  File "C:\Python\lib\site-packages\pysradb\basedb.py", line 103, in query
    results = self.cursor.execute(sql_query).fetchall()
sqlite3.OperationalError: no such table: metaInfo
DaasDaham commented 4 years ago

It's not broken. You are using SRAdb which requires SRAmetadb.sqlite to be downloaded locally on your machine using $ pysradb metadb command. This explains why a "no such table" error is thrown. The _verify_srametadb() in SRAdb function has a check for this. I too faced this problem as it is not mentioned explicitly in the docs and since it creates a file of the same name entered into SRAdb() function in the current working directory, it interferes with $ pysradb metadata <sra_id> command also since the metadata() function in cli.py has a check to see if there is a database with this name in cwd. This code, however, would work:- from pysradb import sraweb db = sraweb.SRAweb() df = db.sra_metadata('SRP098789') df.head()

A few things that can be done: 1) Mention in the Python API Usage section in docs to download metadb locally on machine before using SRAdb. 2) Modify the _verify_srametadb() function to throw an error which mentions to run pysradb metadb before running the current command. 3) Introduce a layer of abstraction just like the cli command $ pysradb metadata <sra_id> to check if db is available locally else use "web" db. @saketkc Can I work on this?

saketkc commented 4 years ago

Thanks @Chokerino for the bug report. I agree its a bug, but the API is not broken. @DaasDaham's findings are indeed correct. We are trying to move away from the SRAmetadb.sqlite usage as it is not up to date and has records missing at random. That said, I still want to keep supporting it until SRAweb mode can support everything that the SRAdb mode (the one relying on local sqlite db) does.

@DaasDaham your suggestions look good! Feel free to send a PR and I will be happy to take a look.

DaasDaham commented 4 years ago

@saketkc Sure, I'll do it

DaasDaham commented 4 years ago

@saketkc why is there no mention or use case scenario of SRAweb() mode in the docs as python API (in the python API usage section). Is it because web mode isn't complete yet? But then cli commands also use SRAweb() when SRAmetadb.sqlite isn't available locally so it can be used by python files also by importing sraweb module.

saketkc commented 4 years ago

why is there no mention or use case scenario of SRAweb() mode in the docs as python API (in the python API usage section). Is it because web mode isn't complete yet?

Partially, yes. And partially because I have been slow at updating the documentation. Most part is interoperable between the SRAdb and SRAweb mode and as such most of the documentation works for both modes. But, not all.

But then cli commands also use SRAweb() when SRAmetadb.sqlite isn't available locally so it can be used by python files also by importing sraweb module.

The rationale for this design was to return the user some results rather than just failing with an error if SRAmetadb is not specified or present in the current working directory. This might not be the best approach, but I could not think of a better approach when I wrote it. Happy to take suggestions. The ultimate goal is to essentially get rid of the SRAdb mode. But we cannot until search is supported.

DaasDaham commented 4 years ago

Thanks for the clarification!

saketkc commented 4 years ago

Closed via #31