lingpy / lingpy

LingPy: Python library for quantitative tasks in historical linguistics
http://lingpy.org
GNU General Public License v3.0
123 stars 34 forks source link

Refactoring: use a template library to create formatted output #211

Open xrotwang opened 8 years ago

xrotwang commented 8 years ago

Rather than stringing e.g. HTML tags together in long python functions, we should use a template library to separate format from data structure, and improve testability of output generating code. My recommendation would be Mako, because

LinguList commented 8 years ago

Agreed, now I also see what you mean by the template system.

Just as a short overview here, regarding different formats that are currently produces:

Apart from that, we have all the plots. But here, I guess we need to distinguish between a plotter and a writer, since the plots visualize the data, while the writer still displays the data, although, with tex as one of the formats, the borders are not really strict here...

xrotwang commented 8 years ago

@LinguList thanks for this list! Makes for a nice task description for thursday :)

LinguList commented 8 years ago

@xrotwang, I'll probably will still add more things to the list later (just realized that I had forgotten csv, and also the basic wordlist format...)

xrotwang commented 8 years ago

@LinguList A first example of using Mako to create output can be seen here: https://github.com/xrotwang/lingpy/commit/069b71d84527f036864c21ede2257e21f9311d7e I think it's worth the additional requirement (i.e. the mako library). What do you think?

LinguList commented 8 years ago

I agree that it's worth the additional dependency. First, since the templates will be much easier to handle now. Second, since the documentation will also be easier (one can say that there are basic types with args and keywords for the classes, and users with high ambitions will need to turn to mako for creating their own templates), and third, it will be much easier to customize things like, e.g., different nexus styles, different distance matrix outputs, etc., and although we have only a few of those variants in the library at the moment, there are many more out there (also for the handling of scoring functions, depending on biopython or other libraries), and handling them by hard-coding will just be a pain.

LinguList commented 8 years ago

One thing I was just thinking about is the question: when would we use templates for writing, and when would we need to go for other stuff? The point is the following:

The borders between, say, "text-file" and "plot", are, however, with the html, also the latex-support for MSA files, not completely clear-cut, as one could see the TEX-export as a plot, and HTML is supposed to be treated as a plot, that is, as something stable that one does not further modify, and which is for looking at it, not for modifying it manually.

So I'm asking myself, how to best think of these things, that is, text-export, hybrid-html-tex-export, and plots. Should we officially treat the hybrid exports as text-export (also in the documentation), or should we make a distinction between file output and, say, html-output?

There are three possibilities:

This might be useful also for documentation purposes to have it somehow fixed and used similarly across all methods. I would tend to go for the distinction between output/export/plot, with output pointing to formats that can be read in again, export to formats for presentation, and plots to graphics. Does that make sense?

xrotwang commented 8 years ago

I think the best unit for pluggability when it comes to output is the adapter, i.e. a piece of code defined by the kind of object it adapts and a mimetype it adapts to, e.g. text/tex or maybe image/png. Whether it does so using a template is secondary. Am 19.03.2016 09:35 schrieb "Johann-Mattis List" notifications@github.com:

One thing I was just thinking about is the question: when would we use templates for writing, and when would we need to go for other stuff? The point is the following:

  • we have simple stuff, like csv-writing, where using python.csv is probably the best
  • we have json as a format that also has automatic support for writing / rendering
  • we have the more complex or user-defined things for which we need templates, like phylip.dst-format, nexus, newick, etc.
  • we have plots where we pass to matplotlib, since they cannot be handled in a template

The borders between, say, "text-file" and "plot", are, however, with the html, also the latex-support for MSA files, not completely clear-cut, as one could see the TEX-export as a plot, and HTML is supposed to be treated as a plot, that is, as something stable that one does not further modify, and which is for looking at it, not for modifying it manually.

So I'm asking myself, how to best think of these things, that is, text-export, hybrid-html-tex-export, and plots. Should we officially treat the hybrid exports as text-export (also in the documentation), or should we make a distinction between file output and, say, html-output?

There are three possibilities:

  • text-file + html/tex/etc. as "text-export" and plot as separate export (wordlist.output, wordlist.plot)
  • textfile, hybrid, and plot as three separate things (wordlist.output, wordlist.export, wordlist.plot)
  • textfile as separate and plot and export as one thing (wordlist.output, wordlist.plot)

This might be useful also for documentation purposes to have it somehow fixed and used similarly across all methods. I would tend to go for the distinction between output/export/plot, with output pointing to formats that can be read in again, export to formats for presentation, and plots to graphics. Does that make sense?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/lingpy/lingpy/issues/211#issuecomment-198668162

LinguList commented 8 years ago

I agree.