greenelab / staNMF

A python implementation of Stability NMF
BSD 3-Clause "New" or "Revised" License
6 stars 3 forks source link

staNMF first commit #1

Closed amyecampbell closed 8 years ago

dhimmel commented 8 years ago

The pyc file (staNMF/staNMF.pyc) shouldn't be tracked with git.

You should create a .gitignore file that excludes *.pyc files. You should also remove the pyc from git tracking. Consider even ammending your existing commit to remove the pyc file from history.

dhimmel commented 8 years ago

Consider putting data in its own directory. Mixing scripts and datasets in the same directory can get messy.

Regarding updatedexpressionpatterns.csv, I don't like the file name? Why not just expression-patterns.csv -- what is updated?

Also, updatedexpressionpatterns.csv has numbers specified to overly precise decimal places. I think the file size savings are worth reducing the number of decimal places, especially since you're tracking the file with git.

amyecampbell commented 8 years ago

Also, updatedexpressionpatterns.csv has numbers specified to overly precise decimal places. I think the file size savings are worth reducing the number of decimal places, especially since you're tracking the file with git.

Yeah, I agree. I just used the same file that the supplementary data section of this paper used, but I should change the title to something that explicitly defines it as their data

I am, however, hesitant to change the precision of the datafile, because it's someone else's data for the purpose of as closely replicating their method as possible. Is that true, or doesn't it matter?

gwaybio commented 8 years ago

Really nice work here @amyecampbell! I was able to sprint through this pull request mainly because the documentation is excellent! I have some general comments:

  1. Vectorizing your code will lead to performance improvements - I highly recommend doing this
  2. Can you think of any attributes the staNMF class could store? i.e. Something a user will want to use that occurs internally
    1. For example, maybe a user would be interested in stability coefficients? Something like:
# ~~~~~~~~~~~~~~
# Directly from your example
# ~~~~~~~~~~~~~~
import staNMF as st

exampleNMF = st.staNMF(filename='example', folderID="DrosophilaExpression")

# Run NMF and output factorization solutions over the range of K's
exampleNMF.runNMF()

# Calculate instability for each K
exampleNMF.instability()

# ~~~~~~~~~~~~~~
# But now, instead of plotting, a user can output instability coefficients for each k
# to do with what they please.
# ~~~~~~~~~~~~~~

# Get instability coefficients
stability = exampleNMF.coef_

# Get input k values
k_range  = exampleNMF.k_
amyecampbell commented 8 years ago

Good idea. Added get_instability() function that returns the dictionary

gwaybio commented 8 years ago

besides that one comment, looks good to me! :+1:

side note - have you tried codeclimate-ing your code?

gwaybio commented 8 years ago

LGTM :+1:

dhimmel commented 8 years ago

Yeah I think if we have more feedback we can create issues or address later, let's get this merged.