tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
78 stars 12 forks source link

Move from shelves to more reliable data structure (hdf5 or SQL database) #84

Closed tleonardi closed 4 years ago

tleonardi commented 5 years ago

Using shelves for storing SampCompDB objects might cause portability issues when running nanocompore in an environment different from the one used for downstream analysis. For example, running nanocompore in the Singularity image of nanocompore_pipeline produced as DB which is unreadable on Arch (python3.5):

$ ls
out_SampComp.db nanocompore_b376825.img

$ /usr/bin/python3.5                                                                                                                                                                                                      
Python 3.5.6 (default, Apr 17 2019, 14:59:13)                                                                                                                                                                                                                                               
[GCC 8.2.1 20181127] on linux                                                                                                                                                                                                                                                               
Type "help", "copyright", "credits" or "license" for more information.                                                                                                                                                                                                                      
>>> import shelve                                                                                                                                                                                                                                                                           
>>> shelve.open("out_SampComp", "r")                                                                                                                                                                                                                                                        
Traceback (most recent call last):                                                                                                                                                                                                                                                          
  File "<stdin>", line 1, in <module>                                                                                                                                                                                                                                                       
  File "/usr/lib/python3.5/shelve.py", line 243, in open                                                                                                                                                                                                                                    
    return DbfilenameShelf(filename, flag, protocol, writeback)                                                                                                                                                                                                                             
  File "/usr/lib/python3.5/shelve.py", line 227, in __init__                                                                                                                                                                                                                                
    Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
  File "/usr/lib/python3.5/dbm/__init__.py", line 85, in open
    raise error[0]("need 'c' or 'n' flag to open new db")
dbm.error: need 'c' or 'n' flag to open new db
>>> quit()

$ singularity shell nanocompore_b376825.img                                                                                                                                       
Singularity: Invoking an interactive shell within container...

Singularity nanocompore_b376825.img:> python3
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shelve
>>> shelve.open("out_SampComp", "r")
<shelve.DbfilenameShelf object at 0x7f5e6ade4eb8>
a-slide commented 5 years ago

We should also move the pvalue correction from SampCompDB to SampComp so that the DB reloading is essentially instant and with a very low memory footprint

tleonardi commented 4 years ago

Closing due to inactivty