matsen / pplacer

Phylogenetic placement and downstream analysis
http://matsen.fredhutch.org/pplacer/
GNU General Public License v3.0
72 stars 17 forks source link

"malformed row in jplace" after pplacer warning "computed function value is infinite or NaN" #340

Closed nhoffman closed 9 years ago

nhoffman commented 9 years ago

After running pplacer with options "pplacer -p --inform-prior --prior-lower 0.01 --map-identity", I see the following warnings:

Warning: GSL problem with location 174 for query M00829:20:000000000-A49AG:1:1102:16591:10591:6; Skipped with warning "computed function value is infinite or NaN".
Warning: GSL problem with location 174 for query M00829:20:000000000-A49AG:1:1108:12346:12753:6; Skipped with warning "computed function value is infinite or NaN".

The resulting jplace file causes an error in guppy:

guppy redup -m -o output/redup.jplace.gz -d /fh/fast/fredricks_d/bvdiversity/project-refset-creation/rp_mock_community/output/weights.csv output/dedup.jplace
Uncaught exception: Failure("malformed row in jplace")
Fatal error: exception Failure("malformed row in jplace")

The working directory for the command above is /fh/fast/fredricks_d/bvdiversity/project-refset-creation/rp_mock_community_yapp Happy to upload or provide the files some other way if you can't get to them.

Thanks!

nhoffman commented 9 years ago

Note that I moved "output" to "output-pplacer-issue-340" within the directory above

nhoffman commented 9 years ago

Update: errors persist after removing the two sequences above:

Warning: GSL problem with location 174 for query M00829:20:000000000-A49AG:1:1101:18632:23398:6; Skipped with warning "computed function value is infinite or NaN".
Warning: GSL problem with location 55 for query M00829:20:000000000-A49AG:1:1102:12088:16764:6; Skipped with warning "computed function value is infinite or NaN".
Warning: GSL problem with location 55 for query M00829:20:000000000-A49AG:1:1109:19844:5672:6; Skipped with warning "computed function value is infinite or NaN".

Also, here's the refset: /fh/fast/fredricks_d/bvdiversity/project-refset-creation/rp_mock_community/output/MCB_V1V2/MCB_V1V2_named-1.0.refpkg

matsen commented 9 years ago

Sorry, but could you please paste an entire pplacer command?

stoat project-refset-creation/rp_mock_community_yapp ‹rp_mock_community*› » pplacer -p --inform-prior --prior-lower 0.01 --map-identity -c /fh/fast/fredricks_d/bvdiversity/project-refset-creation/rp_mock_community/output/MCB_V1V2/MCB_V1V2_named-1.0.refpkg -o ~/pplacer-problem/problem.jplace data/seqs.fasta
Running pplacer v1.1.alpha16-14-g14107e1 analysis on data/seqs.fasta...
Didn't find any reference sequences in given alignment file. Using supplied reference alignment.
query IBRIB9O01ADJCM is not the same length as the reference alignment (got 425; expected 2238)
stoat project-refset-creation/rp_mock_community_yapp ‹rp_mock_community*› »
nhoffman commented 9 years ago

Yes, sorry:

pplacer -p --inform-prior --prior-lower 0.01 --map-identity -c /fh/fast/fredricks_d/bvdiversity/project-refset-creation/rp_mock_community/output/MCB_V1V2/MCB_V1V2_named-1.0.refpkg output/dedup_merged.fasta.gz -o output/dedup.jplace -j 20
nhoffman commented 9 years ago

Update on this: the errors appear to be reference-set specific (both commands below refer to the same set of query sequences).

pplacer -p --inform-prior --prior-lower 0.01 --map-identity -c /fh/fast/fredricks_d/bvdiversity/project-refset-creation/rp_mock_community/output/MCB_V3V5/MCB_V3V5_named-1.0.refpkg output-V3V5/dedup_merged.fasta.gz -o output-V3V5/dedup.jplace -j 20

Warning: GSL problem with location 266 for query M00829:20:000000000-A49AG:1:1108:7562:7606:6; Skipped with warning "computed function value is infinite or NaN".

pplacer -p --inform-prior --prior-lower 0.01 --map-identity -c /fh/fast/fredricks_d/bvdiversity/project-refset-creation/rp_mock_community/output/MCB_V1V2/MCB_V1V2_named-1.0.refpkg output-V1V2/dedup_merged.fasta.gz -o output-V1V2/dedup.jplace -j 20

Warning: GSL problem with location 174 for query M00829:20:000000000-A49AG:1:1101:18632:23398:6; Skipped with warning "computed function value is infinite or NaN".
Warning: GSL problem with location 55 for query M00829:20:000000000-A49AG:1:1102:12088:16764:6; Skipped with warning "computed function value is infinite or NaN".
Warning: GSL problem with location 55 for query M00829:20:000000000-A49AG:1:1109:19844:5672:6; Skipped with warning "computed function value is infinite or NaN".
matsen commented 9 years ago

This ended up being a bad error message-- the issue was in the redup file.