Open pavlis opened 4 years ago
I think we had a conversation about the Python documentation thing before but could not find a record in this repo. I guess that must be in a telecon. Anyway, I think the guides from Obspy are pretty good, and we should follow their practice, especially the coding style guide that shows how they use doc strings to document the API. Not sure how to handle the mix of C++ and Python yet, but I do believe there has to be a clean way as you have the doc strings in the Python bindings already.
For the user manual, I do think we need something better, but I have not dug into that yet. I know a lot of the projects uses Read the Docs, which could be the one to go with. Also, you might not already be aware of that GitHub's wiki page is actually under version control. You can see the commit history and even clone it locally with git clone https://github.com/wangyinz/mspass.wiki.git
.
VERY useful sources there. Really like this sermon that I found as a link in one of those pages. The author is right on about this issue. If you haven't seen that you must read it.
Two actions, I think, are followups:
I have produced a prototype home page that is only a raw table of contents for the package documentation. I'm going to check this into the master branch under a new directory docs/html. The file is index.html. If you think that is a bad file organization, change as you see fit and handle such changes with git. If you have any suggestions on organization or content in my prototype request we discuss it here before making changes to preserve any history of our thinking on this matter.
This could have been in the previous comment, but it is a slightly different issue. Suggest we do all the documentation in a web oriented form like html. At one point I thought we might want some pdfs to handle technically oriented documentation of algorithms that require a lot of equations. Probably can still do that, but I had forgotten until the morning about the existence of things like mathjax (I think that is the name) that allows embedding tex descriptions of equations in html documents.
Agreed we should plan html as the core format for documentation?
Yes, I think html should be all we need. It seems we could just host our documentation on readthedocs. The only issue is to figure out how to put doxygen generated pages there as that is a site designed mainly for python projects.
If you think readthedocs is a good choice, we could just serve the doxygen pages form indiana or Texas and have a link in the appropriate pages. Suggest maybe IU is better as an emeritus faculty member I will have an account here until I die, which we hope isn't too soon.
Found this that shows it might not be that hard to have doxygen on readthedocs.
btw, do you have a good reference for doxygen. I have never compiled it before, and I need to learn all these first to build our own document site.
That looks pretty easy. Only issue I'd see with that is it will require doxygen installed by cmake. That build is already getting pretty long, although I don't think doxygen is that huge. Also have no idea if it can be autoinstalled. Worst case we'd have to have the install documentation say the user needs to install doxygen if they want to have a private copy of the C++ api pages.
I don't think we should include doxygen as an aotuinstalled package, instead we could add a make doc
target when doxygen is found in the system. Probably something like this would work.
Like what you did to build the documentation in github automatically as we do updates. Spectacularly useful way to make sure the documentation stays current with the documentation. Brilliant.
Subject here though is adding the documentation for the schema. I found some examples online for different ways to build tables with rst. This one looks the most promising to me. It would be very easy to create a csv file from the mspass.yaml file (well actually from MetadataDefinitions in python would be how I'd create it) Thought of this when I was perusing the new documentation and remembered we needed a way to document the schema. This fits perfectly with the dynamic update model as adding a new attribute to the mspass.yaml file would cause (ideally) to appear in the table(s) created by that mechanism.
Note also:
I started to seriously explore juypter notebooks the past few days. It seems the unambiguous solution for creating tutorials for mspass. Do you concur? If so, I think I will start creating one for running the deconvolution code with the test python codes I was writing last week. "Kills two birds with one stone" as the saying goes. Provides a test program and a tutorial all in one.
I'd like to start a dialogue here on what tutorials should be developed and if jupyter is the right medium. Look forward to your response.
Yes, we should definitely go with Jupyter. Practically, the tutorial should be better put in a different repository as it is not considered as source code nor documents.
Another issue is that we will want to have Spark included in the tutorial. Although we could still use jupyer for that, the setup will be different and the code won't work properly in a common jupyter setup. Currently, I think the solution is to have jupyter in the container, but that might unnecessarily inflate the image. Maybe we should release two different container images down the road: one with only the core components and one with everything including Jupyter.
I'm not sure it would be wise to split up the repository for documentation. Jupyter isn't that large a package and should be a marginal add on to an already large container. Further, I found this site that argues it is good to put notebooks in docker containers. A good tutorial will need a complete setup to be effective, which is why having it run under docker would be helpful.
In any case, we concur that jupyter should be the way we structure tutorials.
Over the past several days I've had time in the morning from the time zone skew to work on the documentation pages for the MongoDb schema. I wrote a small python program that build a series of csv files that can be used to build pretty tables as noted in an earlier section of this issues document. There is then a master rst file that uses a "files" directive to read the set of csv files to build a readable document. To be specific, the current set of csv and rst files are the following:
3Cdata.csv
MongoDB.csv
aliases.csv
all.csv
files.csv
obspy_trace.csv
phase.csv
site.csv
sitechan.csv
source.csv
The csv files are intended to be automatically generated by running the python program in the same directory. The name of the program is irrelevant at this point. There are multiple files because all.csv lists all the attributes while the others are used to build smaller tables that have a logical or required relationship.
The issue this brings up is I have no idea how to use this to provide automatic updates of the documentation when when we update the schema definitions? Eventually this should stabilize by for the near term the set of attributes that define the schema are likely to change a lot. It may be appropriate to just say we'll manually update the csv files whenever the mspass.yaml file is modified. However, the text of the rst file will change much more slowly, I suspect, than the tables. As a minimum we will need a way for csv files and a python script to live in harmony with rst files in the documentation source directory.
Wait for me to check this in if that is too confusing. I am writing this from the Phoenix airport and should be home later this afternoon.
I still have not see the actual files, so probably not understanding it correctly. I think all we need is a python script to parse the mspass.yaml
file, and generate a number of rst files to be used to generate the documentation. If you already have a python script that can do similar conversion, then we should be pretty close to make it completely automatic. I should be able to add that into our current sphinx setup.
Yeah, that would have been impossible without the material I just checked into master. Here is he procedure I've been doing manually. We need to either put this somewhere that we can repeat this with a few commands when we need to make a schema change or automate it. Suggest you first make sure you can repeat this procedure in a scratch area before trying to figure out how to automate it.
I am still in the middle of making all these run with sphinx. One issue I realized when doing it is that the $MSPASS_HOME
is not included in our Python setup. I am thinking to create some kind of default alternative hard coded in the code so that we don't need to worry about the env variables within Python being messed up somehow. Probably not something to worry about for now, so I copied that into the mspasspy package. Still need to figure out a robust way to define $MSPASS_HOME
. Anyway, I will got all these resolved eventually...
I thought about putting the topic of this comment in a new issue, but decided it mostly fits into the topic of documenation. The problem is it could equally be put in a discussion of test programs, but the tougher issue is documentation so I'll put it here.
The problem I want to discuss comes up from testing the new graphics module I've been developing. The only way I know to test graphics code is to make it draw something and visually see it is worked. Graphics generating test programs are nearly guaranteed to break Travis, or so I suspect. What I propose to do is enhance the test program I've been using a bit and make it a jupyter notebook tutorial on mspass graphics. The notebooks provides a convenient way for one of us to verify the graphics module is working correctly and at the same time builds a valuable tutorial. Do you concur or do you have some other mechanism to test graphical code? Independent of testing I do think a jupyter tutorial on the graphics module is an important addition. There is not better way to get most scientists hooked than to have a simple graphics system where they can get a pretty picture quickly.
This brings up a couple issues related to documentation.
import matplotlib.pyplot
import numpy as np
import tutorials
were tutorials is a python module containing the not for public consumption ancillary code needed to drive the tutorials.
First, do you concur that putting this kind of stuff in a special module(s) is the way to go with this?
If you concur there are two issues I see: (1) where to put this module and (2) how to set it up so its components don't get posted in the documentation?
I think the solution is simplified by adopting the habit to only run jupyter tutorials from the docker image. That way we can put the tutorial.py (or whatever we call it) in a common place outside the mspass tree and the notebooks can reference it and be assured it will be found. If you concur this is the right model maybe you can judge better than I how to structure the tutorial area with that in mind. I'm going to look into running jupyter from docker - I know is a standard approach as previous web searches yielded long lists of how tos on the subject.
Followup: found this useful page on jupyter and docker
Well, I think you are getting at exactly the reason why people always host a separate repo for tutorials - there can be a lot of unrelated code that only make sense to the tutorials. At the end of the day, documents and tutorials are two distinctively different things, and we probably shouldn't put them together just for convenience. Since you are preparing the tutorials now, maybe it is the time for us to open up that new repo. Yes, and I think we can use docker with Jupyter in that new repo.
There we go: https://github.com/wangyinz/mspass_tutorial
btw, for testing graphics, I think you are correct that nothing works better than a human eye. I don't think there is a good way to test the visual correctness of the plot itself. As discussed here, we probably could use the method in the accepted answer there to partially test it.
Hey folks, it seems I missed a lot of discussions here... what is this tutorial used for? for future mspass users?
Yes, it will be for future users. I think you can mostly ignore this part for now. Tutorial is different from documentation, and you only need to write the latter for the code you develop.
It is way past time to make a decision about how we are going to do the documentation for mspass. There are a least three components we need to address sooner rather than later.
Let me know what you think.