jlevy44 / PathFlowAI

A High-Throughput Workflow for Preprocessing, Deep Learning Analytics and Interpretation in Digital Pathology
https://jlevy44.github.io/PathFlowAI/
MIT License
39 stars 8 forks source link

Update Click, add .gitignore, format code, and use Poetry #8

Open sumanthratna opened 4 years ago

sumanthratna commented 4 years ago

I'm not sure if this is a PR that you want to merge—if it is, I can create another PR with the same changes as this for the dev_object_detection branch.

This PR:

jlevy44 commented 4 years ago

Nice!!!! Thanks for linting the repo amd updating the docs! Also, I collapsed and PR'd the dev_object_detector branch. We can PR to master or feel free to make branches. Generally, I'm hoping that all features that have been added to the public release are fully functional, so I may further deprecate code that is not fully functional.

A couple of questions: Have you verified that the docs work? Has any of the linting caused any of the modules to fail? And referring to your previous PR, I saw that you had attempted to change the orientation of the slide, any impact on downstream functionality?

We may want to consider starting a unit-testing framework with continuous integration. It's been in the plans for a while but haven't gotten around to it. We could both iterate on the framework and incorporate in pieces if you are interested.

In any case, will probably accept this PR, just want to test the code in your commits before proceeding and make sure it doesn't significantly conflict with the previous PR.

jlevy44 commented 4 years ago

We should also make sure poetry is compatible with conda. I haven't used poetry before, but seems great. Eventually this will be packaged with Docker (we already have a preliminary Dockerfile but unreleased) and just need to make sure it can handle more complex installs such as nvidia Apex.

sumanthratna commented 4 years ago

I had a lot of those questions and was planning on bringing them up to you.

The docs have a pretty significant error—the types of parameters get combined with the parameter names. This is because of numpydoc I have a commit ready that'll fix this within the next 2 minutes (hopefully).

I also wanted to test all of the methods but since PathFlow doesn't have a testing framework set up, I haven't had a chance to do that yet. Obviously we should do some testing before merging—if you'd like I can set up some basic testing using pytest.

As for the orientation of the slide, I don't expect it'll cause any errors but we should figure that out with the testing.

Also—unfortunately, Poetry doesn't easily publish to conda (I think). However, I think I could write a script to automatically publish to Conda for us. The same goes for Docker.

Also—Poetry doesn't allow custom install instructions (this is probably the biggest downside of Poetry for PathFlow). We could set up a script called pathflow-setup which installs apex and other libraries. Then, in the documentation, we could mention that users should run pathflow-setup after installing.

jlevy44 commented 4 years ago

Were you building the docs using sphinx?

That sounds great. In the bin, I have a script called install_apex, but we can just merge this into pathflow setup.

If you want to try the conda and docker set ups, that would be great, I can also set it up as well. I think once we throw in a few pytest modules (we don't need to do the entire framework yet), we should be good to go for the merge.

sumanthratna commented 4 years ago

Yup, the docs were built with sphinx. To build the docs:

cd docs
make html

I'm working on writing a Makefile for development files right now, I'll add that to this PR soon.

jlevy44 commented 4 years ago

This is a really impressive PR that moves us closer to a more reproducible workflow.

I would say before merging, we need the remaining minimal set of tests:

  1. Preprocess pipeline test (is a SQL file generated? Zarr? NPZ?), in a future PR we can consider removing the dependence on zarr and npz
  2. Quick test of train_model (generate a pretrained model, no need to train, but if training, only store 100 patches)
  3. Preliminary tests of visualization module (UMAP plot production with plotly, UMAP with imagenet pretrained network with images overlaid; this can be done without training the model, we can discuss over Slack).
  4. The aforementioned CLIs work, no click errors or errors introduced by using black.

To perform the tests the most efficiently, you can also subset the SQL database to the first 100 entries, which should make 2 and 3 easier to accomplish. Some of the steps for performing these tasks are in the Wiki.