occam-ra / occam

OCCAM Reconstructability Analysis Tools
Other
20 stars 14 forks source link

Structural Design documentation and initial recommendations #32

Open gdcutting opened 5 years ago

gdcutting commented 5 years ago

The programmer-oriented documentation is out of date (15+ years old). After spending three or four weeks with the code, I am realizing that we are in need of documentation that will provide an overview of the OCCAM application structure; references for C++ objects and interfaces (the Design Proposal from 2000 is very limited); the python wrapping, and suggestions for changes focused on division between c++ and python that will make OCCAM easier to repackage and give the user a good experience by virtue of good modular design; and some other comments on overall design, practical considerations, and how to prioritize time. It is also important to introduce some considerations of design and engineering that are not specific to any particular language or platform, but should help inform the high-level discussion of how to perform a structural upgrade and repackaging on OCCAM.

This will help the capstone team in getting up to speed, and facilitate the conversation of the design and engineering issues involved and how to approach the process of making choices about priorities. I am starting to become aware of the fact that investing too much more in the existing framework is probably not a good use of time. Once we update the framework some for proper python packaging, our development efforts (in terms of bug fixes, functionality improvements, new additions) will be more productive, and hopefully we will then have access to a wider community of developers who might be interested in contributing.

I have started working on a document that gives an overview of the application structure, some details about the C++ extensions and python wrapping, overall considerations about design and engineering with a python package in mind, some specific aspects of the current OCCAM implementation that I think need most attention, and some other thoughts about priorities and making best use of time. Hopefully this will help advance the conversation in the period after Marty gets back until the end of the term (as the capstone team is starting to come on and get up to speed) and beyond. I'm going to focus on this for a couple of days and will post a link to work in progress when I'm a little further along.

gdcutting commented 5 years ago

Started a programming overview document here: https://github.com/gdcutting/occam/blob/master/doc/occam-structure.md

gdcutting commented 5 years ago

Notes on Upgrade Design Proposal. Need to:

gdcutting commented 5 years ago

I realized that it is important to introduce a discussion of key principles of software design and engineering to inform the discussion. Not everyone on the team is an experienced programmer (or a programmer at all), but it should be possible to discuss high-level issues of design without detailed knowledge of a programming language. I am using as references: Software Engineering for Students (Bell)[which is not as detailed as, for example, Sommerville, but for that reason provides good material for undergraduates and non-programmers) and the Stroustrop C++ book (which, besides being obviously an authoritative reference on the C++ language, contains some absolutely brilliant discussions of design which transcend any particular language or platform). Without rewriting a book on this topic, I am putting in a couple of pages of discussion, which I believe will help frame the more language-specific discussion of OCCAM structure and implementation (and some proposals on how to upgrade it) that follows.

gdcutting commented 5 years ago

We will want some design specifications. I am looking into the best way to approach this. Probably will become a separate issue but I'm putting it here for now.

gdcutting commented 5 years ago

Key question: what should the python object structure be? Working answer: a lightweight version of the C++ object structure. The python classes do not have to reproduce all the data and methods from the C++ objects (and indeed they should not).

Need a proposed design specification for the python layer.

gdcutting commented 5 years ago

Gannt chart might be useful? Will revisit once I get more into the details of structural changes and roadmap.

gdcutting commented 5 years ago

I realized that it is important not to reinvent the wheel in this process, or to waste time following development paths that turn out to be dead-ends. That is, what we really need is a good model of what we're trying to produce with this repackage and structural upgrade. I am looking around to see if I can find a project that's similar to OCCAM in scope, purpose, and implementation (hybrid python/c++). There are a lot of those but I want to find one (or a few) that are as close to OCCAM as possible.

If we have a clear idea of what we want OCCAM to look like after some structural improvements that's based on other successful projects, it will help to clarify the path we need to follow to get to an upgraded codebase that will help us level up our development efforts. Will do some more digging around on this over the next few days an couple of weeks...

venkatachalapathy commented 5 years ago

Doxygen seems to be a de facto standard for C++.

Is this something we should use for our documentation project?

BartMassey commented 5 years ago

I do not recommend Doxygen, having watched some projects successfully switch away from it in a relieved fashion after years of unhappy use. I can give some details if needed.

I'm not sure choice of documentation format is too important. I'd be inclined to look at Multimarkdown, but also just plain ol' LaTeX isn't terrible. The only thing I'd avoid is anything that is too complex or that produces too-ugly output.

gdcutting commented 5 years ago

I like readthedocs because it will render markdown or rst, so you can keep your docs in markdown format in your repo and they will leak great in GitHub, and they will render cleanly in Sphinx when you build your readthedocs page. It's in use by nearly 200,000 open source projects so it's got community support.

gdcutting commented 5 years ago

@venkatachalapathy, I considered a doc generation tool, particularly for the C++, but I don't think they're very useful. They just examine header files and reproduce the declarations. This might make it marginally easier for people to see those, but they can easily examine the header files themselves (see include/). What's important, that can't be generated automatically, is commentary on what the members and methods actually do. Someone that's familiar enough with the code to understand what the classes are actually doing and how the interfaces work needs to do that. This is definitely on my list of things to do after I finish some notes about the design and interfaces.

venkatachalapathy commented 5 years ago

I understand. I found pandoc to be quiet useful; it isn't bad with its version of Markdown.

Going forward, I am imagining how the documentation (whether it is about structural design or user manual) should be written. All of us think about OCCAM as a DMM implementation. And Marty's language in class infiltrates the way everything is designed. Now, if we wish to reach the ML or Stat community at large, we have to rework the language that is more outward facing, devoid of its reliance on Marty's mental model of OCCAM.

For a generic individual from the community, Information theoretic language is fine, it is attractive and functional; Reconstructibility Analysis argot, not so much. My feeling is that we need to work on a vocabulary that is easy to understand by the community while retaining the essence of RA in OCCAM.