harmslab / phylogenetics

A Python API for managing phylogenetics projects
http://phylogenetics.readthedocs.io
BSD 3-Clause "New" or "Revised" License
6 stars 12 forks source link

No load function for PhylogeneticsProject object #15

Open biophyser opened 6 years ago

biophyser commented 6 years ago

When trying to read data into a project that already had data loaded (like when restarting a notebook) an exception occurs:

Exception                                 Traceback (most recent call last)
<ipython-input-28-648a14665029> in <module>()
----> 1 project = PhylogeneticsProject(project_dir="project")

~/.miniconda3/lib/python3.6/site-packages/phylogenetics/project.py in __init__(self, project_dir, overwrite)
     24         # Set up a project directory
     25         if os.path.exists(project_dir) and overwrite is False:
---> 26             raise Exception("Project already exists! Use `PhylogeneticsProject.load` or delete the project.")
     27         elif not os.path.exists(project_dir):
     28             os.makedirs(project_dir)

Exception: Project already exists! Use `PhylogeneticsProject.load` or delete the project.

But the .load function does not exist:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-30-995da61d3d00> in <module>()
----> 1 project.load(project_dir="project")

AttributeError: 'PhylogeneticsProject' object has no attribute 'load'

Putting this here so I remember...

biophyser commented 6 years ago

The load function in v0.5.0 won't work because there is no track_in_history functionality in that version. Needs migration from v0.4.1.

Zsailer commented 6 years ago

I think we need to add the load functionality in PR #16. Load needs to be away of the history file as well.

Zsailer commented 6 years ago

As I was thinking about the load functionality more, I realized we need to make a decision here: what format should we use to save PhylogeneticsProject's.

The simplest way (and the way I've previously done it) is using Python's pickling serialization protocol. There many drawbacks to this method (just google "Python pickle is bad").

I think we should use a format provided by PhyloPandas. I'm in favor of JSON (using the to_json method). JSON is lightweight, human readable, and fast.

What do you think?

biophyser commented 6 years ago

I think JSON is the way to go. Besides everything you listed there are multiple ways to parse and write JSON, I'm thinking if a user wants to take the data out of python.