interrogator / corpkit

A toolkit for corpus linguistics
Other
199 stars 27 forks source link

Documentation of internals #7

Closed jamesdavidson closed 8 years ago

jamesdavidson commented 9 years ago

Looking at this as a programmer, the first thing I want to understand is the data structures. What are the inputs, outputs and intermediaries? Usually all that's necessary for this kind of documentation is a sketch of how the various entities are mapped to basic constructs like sets, lists, maps, tuples, booleans, numbers, strings, symbols or nested variants of the same (ie trees and the like). And perhaps a note about how they get encoded in files (ie CSV, Penn Treebank or Python pickles).

For example, I jotted down some notes (https://github.com/jamesdavidson/corpkit/blob/hacking/DATA.md) whilst going through the code. If you'd like, I can help you write this kind of documentation.

interrogator commented 9 years ago

Great suggestion! Where do you think this info is best stored?

jamesdavidson commented 9 years ago

There are basically three options: as text files in the repo, as comments in the source or as pages in the wiki. Let's start with text files in the repo. I've opened PR #9

interrogator commented 9 years ago

9 doesn't quite close this, but sets up most of the file organisation needed to get the rest done. Thanks @jamesdavidson !

interrogator commented 9 years ago

I've just started putting together some auto-generating documentation via Sphinx. For it to work properly, the docstrings for all functions will need to be updated, etc.. More progress, anyway!

interrogator commented 8 years ago

The ReadTheDocs, which uses Sphinx, is a fairly substantial documentation of the most relevant parts of corpkit. Closing!