spholmes / F1000_workflow

43 stars 33 forks source link

Problem with phangorn package in the revised workflow #24

Closed akalichen closed 6 years ago

akalichen commented 6 years ago

Hello everyone, I have run into a problem when I use the F1000_workflow to analyze my own data.

When I run the commandline,

phangAlign <- phyDat(as(alignment, "matrix"), type="DNA")
dm <- dist.ml(phangAlign)
treeNJ <- NJ(dm) # Note, tip order != sequence order

fit = pml(treeNJ, data=phangAlign)
fitGTR <- update(fit, k=4, inv=0.2)
fitGTR <- optim.pml(fitGTR, model="GTR", optInv=TRUE, optGamma=TRUE,
        rearrangement = "stochastic", control = pml.control(trace = 0))
detach("package:phangorn", unload=TRUE)

The whole process become extremely slow and showed error message like

"NA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive valueNA/Inf replaced by maximum positive value"

I follow solutions as suggested in the post, but it seems that all my packages are uptodate. ape 5.1, Phangorn 2.4.0

Grateful to anyone who might have an idea.

Lichen

benjjneb commented 6 years ago

Can you tell us a little more about the size of your dataset? Especially how many sequences are in your alignment, and what is there length distribution? Also what is your R.version?

spholmes commented 6 years ago

When dealing with very large trees, we often use raxml, this is either available externally or in the package ips. The phangorn implementation may suffer for large trees and then memory overflow creates all types of pbs.

On Wed, Apr 18, 2018 at 6:24 AM, Benjamin Callahan <notifications@github.com

wrote:

Can you tell us a little more about the size of your dataset? Especially how many sequences are in your alignment, and what is there length distribution? Also what is your R.version?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/spholmes/F1000_workflow/issues/24#issuecomment-382385100, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvcpFRz2VPAbWWpnrG8GqDxv8ZEsLks5tpz6qgaJpZM4TYjcC .

-- Susan Holmes John Henry Samter Fellow in Undergraduate Education Professor, Statistics 2017-2018 CASBS Fellow, Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

akalichen commented 6 years ago

Thanks benjjneb! My Rstudio version is 1.1.442, my R framework is version 3.4. I think my sample size shouldn't be too large, probably even smaller than the example.

           reads.in reads.out

B1_R1_001.fastq 117268 68347 B2_R1_001.fastq 17862 10269 C_R1_001.fastq 786 268 I1S1_R1_001.fastq 258820 132866 I2S2_R1_001.fastq 228240 121342 I3S3_R1_001.fastq 155917 82739

The sequences being tabled vary in length.

265 266 267 268 275 280 296 297 305 308 315 318 319 320 323 325 20 2 1 1 1 1 1 6 2 3 1 1 1 1 1 1 351 356 360 367 372 380 381 382 389 390 391 392 393 394 395 396 1 1 2 1 1 12 16 1 1 2 16 28 45 9 4 2 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 1 1 1 2 1 1 5 12 9 1 1 7 37 715 629 215 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 195 60 41 20 31 8 2 8 8 35 14 60 23 29 37 43 429 430 431 432 433 434 435 436 437 438 439 440 456 465 466 472 144 910 83 164 44 258 1695 2328 378 45 2 1 1 1 1 1 481 482 2 1 Appreciate your help!

akalichen commented 6 years ago

Thanks spholmes! Should I try https://github.com/amkozlov/raxml-ng/?

benjjneb commented 6 years ago

Agree with @spholmes that you should probably try RaxML. Once the number of sequences gets into the thousands the phangorn tree-building starts running into computational difficulties.

akalichen commented 6 years ago

Thanks @benjjneb and @spholmes ! I will try to use RaxML.

akalichen commented 6 years ago

Thanks @benjjneb and @spholmes! I reduced the sample that I put into the workflow and the commandline went through with no appearance error! Thanks so much!