qiime2 / qiime2.github.io

Old QIIME 2 project front page
https://qiime2.org
BSD 3-Clause "New" or "Revised" License
0 stars 14 forks source link

Various thoughts/recommendations on documentation #24

Closed nbokulich closed 7 years ago

nbokulich commented 7 years ago

I have reviewed the documentation and tutorial with an eye toward figuring out how a completely novice user (e.g., new microbiology grad student without any bioinformatics or programming experience) would view the material. Most of the documentation is fantastic (esp. for alpha) and I love new features, such as the glossary, that improve usability over the qiime1 docs. I have various suggestions below, labeled by what I see as the importance: "[high]", "[low]", or "[enhancement]" (the latter meaning it would enhance usability but is not currently a hindrance to understanding the docs). Some I expect are already planned anyway, but I hope my comments may help hammer these out.

Importing Data [high]

This section needs more detail on the expected file names and other requirements for each type. For example, it is unclear that specific filenames are actually enforced for FeatureData[Sequence] (and I suspect for other semantic types as well). The following error is clear (to me), but it's best to avoid this via better documentation: ValueError: Missing one or more files for EMPMultiplexedDirFmt: 'sequences.fastq.gz'

Directory of Methods [high]

Need a directory of methods, to fill the same niche as qiime1's script index. Most are currently covered in the tutorial, but not all and this will only expand as qiime2 grows. One issue is that qiime2's methods are hidden within plugins, not free-standing commands, and hence just listing the plugins does not reveal all potential methods (and just a few names are not immediately transparent or are jargony). Some sort of function description index, written in plain english for new users, rather than a list of methods names, could be a useful way to approach this (and an enhancement above qiime1's script index, which was difficult to navigate and translate at times). Instead of listing the plugin or command, list the function. Each function will link to 1) the method entry on the plugin doc page or 2) a tutorial page for multi-step procedures (e.g., procrustes plots). For example, functions such as "demultiplex sequences", "build phylogeny", and "pick OTUs" could all be listed as functions.

Artifact Format Documentation [low]

qzv/qza formats are confusing, and as someone very familiar with qiime1 it took me some time to understand what these file formats are and why they are used. The rationale for these formats should be better documented, along with an explanation that these files can be unzipped to examine the contents. This rationale can link to the pages on semantic types and provenance tracking to discuss those topics. some discussion appears here but this should be more clearly documented here and elsewhere (perhaps on its own page that appears in the table of contents). Also make a note of this in the glossary. As an aside (and I know it's too late to quip about this), I don't really like the choice of the term "artifact", because it has other meaning in biology, e.g., "sequencing artifact".

Taxonomy Format [low]

A discussion of the taxonomy format could be useful. Terms like "level 2" are used in the docs but are not immediately apparent to outsiders, nor will a google search be much help. This may be appropriate to include within a file format page (see recommendation below).

Doc Version Archive [enhancement]

The "ported wiki documentation" is very useful, and I recommend continuing to build this as an archive of release docs if possible, rather than removing these pages. One frustration with the qiime1 site was that docs only covered the release version, and if working with an earlier version of qiime or reviewing a list of commands/files generated using an earlier version of qiime, the older docs no longer existed. As qiime2 grows, may I recommend keeping the "ported wiki documentation" as a table of contents (TOC) at the bottom of the current release docs TOC, which will link to TOCs for archived doc versions.

Glossary: add other glossaries? [enhancement]

I LOVE the glossary, as it defines some of the lingo-y words that are new to qiime2. This should be on the reading list of everyone starting with qiime2, to whom "action" and "method" are otherwise more general terms, and "artifact" is not entirely intuitive. I wonder whether it would be useful to include separate glossaries on more general microbiome terminology, and on file types. I recommend separate, because this will keep the technical glossary pure and simple.

Microbiome Terminology: Much of this goes outside of the jurisdiction of qiime, but could be very useful to new users (and would give the developers control over the terminology). After all, users come from all backgrounds and qiime may be the first exposure to any kind of bioinformatics software, microbiome/ecology concepts, or all of the above for many users. For many of these terms, great explanations exist elsewhere on the web (though not necessarily with a simple google search), and a short sentence and link will suffice (and link to citation if appropriate). Some useful terms: distance matrix, OTU, feature table, demultiplex, barcode, index (see barcode), metadata, phiX, chimera, biom, metric (e.g., alpha diversity), (include alpha/beta diversity metrics in glossary, short sentences such as shown here and a link ideally to the original citation would suffice), alpha diversity, beta diversity, discrete (metadata), continuous (metadata), ordination, PCoA, richness

File Formats/Types: In many ways, this is should be similar to qiime1's file types page. A similar resource does not yet exist in qiime2. This is in part to describe file formats that are used in qiime2, and in part to describe how to input specific file types into qiime2 artifacts (yeah, yeah, could be more appropriately described in importing data but if that doc expands to include this you can link to the entries for each file format in that doc from this glossary). Some formats/terms to include: fasta, fastq, gz, qza, qzv, mapping file, biom, OTU table, feature table

Hope this all helps. I can elaborate on details / brainstorm more if prompted.

jairideout commented 7 years ago

Thanks @nbokulich! These are all great suggestions and we will work on resolving them (these types of questions came up during the Iceland workshop too).

jairideout commented 7 years ago

Ported to https://github.com/qiime2/docs/issues/4