netneurolab / pypyls

A Python implementation of Partial Least Squares (PLS) decomposition
https://pyls.readthedocs.io
GNU General Public License v2.0
60 stars 29 forks source link

Add user documentation #19

Open rmarkello opened 6 years ago

rmarkello commented 6 years ago

We need documentation! Our README is a bit spartan, so some more in-depth user documentation would be very helpful for orienting people to the repository and, hopefully, preempting any questions they might have about installation, use, etc. The documentation should ideally describe:

  1. The purpose of pyls,
  2. How to download and install pyls,
  3. Some basic usage information, and
  4. A reference API

For (1), something in line with the project roadmap would be sufficient; for (2), basic instructions on downloading and installation (e.g., python setup.py install) would be perfect, and; for (3), a few in-depth examples would be great, demonstrating the various potential use cases of the code.

I think the best choice for setting all of this up would be Sphinx! Sphinx has a quickstart guide that, while a bit obtuse at times, is sufficient to at least get some bones. Once the bones are there, my tactic has generally been to find documentation that I like and borrow as appropriate (licensing permitting!). One of my other repositories that could be copied, to some degree, is snfpy.

It's worth noting that Sphinx uses reStructuredText for formatting. This is quite a bit different than the Markdown that GitHub normally relies on, so it will be good to keep a reference handy.

rmarkello commented 6 years ago

30 actually went a little ways to improving our documentation! 🎉 📖 🎉 📖 Hurray documentation!

Next steps are now to add more realistic examples of using behavioral_pls() and meancentered_pls() to the online usage documentation and linking to that more directly in the relevant parts of the README. This will still require working in reStructuredText, but can be done a bit more easily by directly editing the usage.rst!

I rather like nilearn's style of tutorials, but think anything is an improvement on nothing! I will try to drudge up some "good" data (where the results will be interpretable and meaningful!) to use in at least one or two examples, but any openly available data that is amenable to a PLS analysis would be great.

KirstieJane commented 6 years ago

Hey @rmarkello - in our PNAS paper Petra and I ran a PLS relating MR measures to allen institute gene meaures. She codes in matlab so that part of the analysis is in matlab but it could be run with this package I think (I hope!)

The gene data is here: DATA/PLS_gene_predictor_vars.csv

The MR input measures are here: CT_MT_ANALYSES/COMPLETE/PLS/COVARS_none/PLS_MRI_response_vars.csv

Here's the command we ran: SCRIPTS/PLS_calculate_stats.m#L57

If this is any help then you're clearly going to make me mega happy....no worries if this data is not what you're looking for!

KirstieJane commented 6 years ago

This also relates to #25 - let me know if you'd like me to put links to the matlab results from that paper in that thread.

rmarkello commented 6 years ago

Thank you so much for suggesting this @KirstieJane; I think this will work really well as an example dataset! ✨

For your own Matlab -> Python goals, it might be worth noting that the PLS algorithm you and Petra used is actually slightly different than the one implemented in the current package. Your use case was more similar to PLSRegression() from sklearn in that it was directed (i.e., how does gene expression "predict" cortical thickness / MT?), while the current package doesn't support such hypotheses. Rather, an equivalent question that the current package could help answer would be, e.g., "how does gene expression relate to cortical thickness / MT?"

All that being said, I still think that we'll be able to reproduce your results (to a point). While some numerical differences are likely, this will make a great real-world example! 😄