vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.12k stars 194 forks source link

README, basic usage, and manual pages need to be decoupled and updated consistently #305

Open ekg opened 8 years ago

ekg commented 8 years ago

For example: https://github.com/vgteam/vg#mapping

If your graph is large, you want to use vg index to store the graph and vg map to align reads. vg map implements a kmer based seed and extend alignment model that is similar to that used in aligners like novoalign or MOSAIK. First an on-disk index is built with vg index which includes the graph itself and kmers of a particular size. When mapping, any kmer size shorter than that used in the index can be employed, and by default the mapper will decrease the kmer size to increase sensitivity when alignment at a particular k fails.

We now use xg/GCSA2, and map using maximal exact matches.

edawson commented 8 years ago

Lots of updates have happened to the wiki and README. However, this is a moving target and there is a lot of nuance to the command line incantations.

It would be good to decouple specifics from general usage. This is the goal of the Quickstart page, though we are still missing a proper manual page and our README probably has too much information still.

I think we should separate these three in addition to providing proper docs at a later date. I will keep working on basic usage, but we'll have to divide up the manual page.

edawson commented 7 years ago

A lot of the help messages have gotten changed in confusing ways. For example, with vg index:

  1. The -d option disappeared, but you can't create a Rocksdb index without it.
  2. The -D option got relabeled. It used to describe it as a way to sort GAM, but now it just says "dump database contents to stdout." The problem is there's no mention of GAM sorting anywhere because of this change. I guess it's in the -A option, so it's just moved. The wiki is still tragically behind the times.

In my mind we'd streamline the CLI for user friendliness (e.g. GAM sorting probably belongs in the sort command and not index, even though the code is with indexing). This will lead to spaghetti code, but right now we have spaghetti usage.

What do the people think?