matsen / pplacer

Phylogenetic placement and downstream analysis
http://matsen.fredhutch.org/pplacer/
GNU General Public License v3.0
74 stars 18 forks source link

guppy classify (hybrid2) should ignore sequences not in jplace #327

Closed nhoffman closed 10 years ago

nhoffman commented 10 years ago

If we run

guppy classify --classifier hybrid2 --nbc-sequences seqs.fasta ... placements.jplace

then all sequences in seqs.fasta, even those not in placements.jplace, will also be classified. This is just a bit unexpected, and is probably not what the user wants. Some options might be some combination of: a) raise an error; b) ignore the extra sequences; c) keep the current behavior but describe it in the docs. A gold-plated solution would be an error unless an option like --ignore-extras is provided.

matsen commented 10 years ago

Sorry for the surprise. This shouldn't be hard-- it comes down to this merge function. I'm not so fond of the gold-plated solution because I don't like options that only make sense when other options are used. Which of the others do you think would be best?

If we were going to do (b), would you suggest taking the intersection of the sequences supplied in the jplace file and those in the NBC file?

nhoffman commented 10 years ago

I think that taking the intersection would be a good solution. Perhaps with a warning about any sequences that are missing from one source or the other?

matsen commented 10 years ago

Closed by https://github.com/matsen/pplacer/commit/3bec8a4922f152b62a2e617e7937436b9c8ae3b1.