ropensci / babette

babette is an R package to work with BEAST2
https://docs.ropensci.org/babette
GNU General Public License v3.0
44 stars 6 forks source link

Doubling of clockRate parameter in XML when using tipdates_filename #93

Closed GaryNapier closed 3 years ago

GaryNapier commented 3 years ago

Describe the bug Hi Richel,

I think I'm almost there!

I've noticed when I use a tipdates file I get a doubling of the clockRate parameter in the XML file like this:

<parameter id="clockRate.c:THAILAND_TEST.clust_1.dated" name="stateNode">0.0000001</parameter>
        <parameter id="popSize.t:THAILAND_TEST.clust_1.dated" lower="0" name="stateNode" upper="200">100</parameter>
        <parameter id="clockRate.c:THAILAND_TEST.clust_1.dated" name="stateNode">1.0</parameter>

And the XML fails to load in Beauti (see screnshot below).

When I run my R code without the tipdates file, the XML seems to work in the sense that Beauti can parse it and open it, however the parameter value doesn't seem to be correct (1.0 when it should be 0.0000001):

<parameter id="clockRate.c:THAILAND_TEST.clust_1.dated" name="stateNode">1.0</parameter>

To Reproduce Script to reproduce the behavior:

Full code:

remotes::install_github("ropensci/beautier")

library(babette)
library(seqinr)

remove_tail <- function(x, sep = "_", del = 1){
  sapply(strsplit(x, split = sep, fixed = TRUE),
         function(i) paste(head(i, -del), collapse = sep))
}

# DIRECTORIES

setwd("~/Documents/transmission/")

fasta_dir <- "fasta/"
metadata_local_dir <- "metadata/"

# FILES
fasta_file <- paste0(fasta_dir, "THAILAND_TEST.clust_1.dated.fa")
fasta_id_date_df_outfile <- paste0(metadata_local_dir, remove_tail(basename(fasta_file), sep = "."), ".txt")

# READ IN FILES
fasta <- read.fasta(file = fasta_file, forceDNAtolower = F)

# Get fasta sample names and parse into name and date
fasta_names <- names(fasta)

fasta_id_date_df <- do.call(rbind, lapply(strsplit(fasta_names ,"_"), function(x){
  data.frame(id = paste(x[1:(length(x))], collapse = "_"), year = x[length(x)])
  }))

# Save df of ids and year as csv for model
write.table(fasta_id_date_df, file = fasta_id_date_df_outfile, quote = F, row.names = F, col.names = F, sep = "\t")

# CLOCK MODEL
clock_rate <- 0.0000001
clock_model <- create_strict_clock_model(clock_rate_param = create_clock_rate_param(value = clock_rate),
                                         clock_rate_distr = create_log_normal_distr(value = clock_rate, m = 1, s = 1.25))

# MCMC
every <- 10000

mcmc <- create_mcmc(
  chain_length = 1e+08,
  tracelog = beautier::create_tracelog(log_every = every),
  screenlog = beautier::create_screenlog(log_every = every),
  treelog = beautier::create_treelog(log_every = every)
)

# TREE PRIOR
tree_prior <- create_ccp_tree_prior(
  pop_size_distr = create_log_normal_distr(
    m = 1,
    s = 1.25,
    value = 100.0,
    lower = 0.0,
    upper = 200.0
  ))

# MRCA PRIOR
mrca_prior <- create_mrca_prior(
  is_monophyletic = TRUE, 
  mrca_distr = create_laplace_distr(mu = 1990))

# MAKE XML
create_beast2_input_file(
  fasta_file,
  "beast_xml/THAILAND_TEST.babette.xml",
  site_model = create_gtr_site_model(),
  clock_model = clock_model,
  tree_prior = tree_prior,
  mrca_prior = mrca_prior,
  mcmc = mcmc,
  tipdates_filename = fasta_id_date_df_outfile
)

You may just need to change the metadata/ path to save the tipdates file.

The fasta file should be in my https://github.com/GaryNapier/transmission, where I've added you as a collaborator.

The difference I'm describing is when the tipdates_filename = fasta_id_date_df_outfile line is commented.

Expected behavior I think there should just be one clockRate parameter with value 0.0000001:

<parameter id="clockRate.c:snps_london_351_dna" spec="parameter.RealParameter" name="stateNode">1.0E-7</parameter>

Screenshots Beauti parse error:

Screen Shot 2021-03-17 at 13 04 51

Environment: Show the results of running the following script:

library(beastier)
beastier::beastier_report()
***********

* beastier *

***********

OS: mac

****************

* Dependencies *

****************

beautier version: 2.5

beastier version: 2.3

**********

* BEAST2 *

**********

Error: .onLoad failed in loadNamespace() for 'rJava', details:
  call: fun(libname, pkgname)
  error: JVM could not be found
In addition: Warning message:
In fun(libname, pkgname) :
  Cannot find JVM library '/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/lib/server/libjvm.dylib'
Install Java and/or check JAVA_HOME (if in doubt, do NOT set it, it will be detected)

The Java JVM could perhaps be the problem? I know nothing about Java, sorry.

Many thanks!

Gary

richelbilderbeek commented 3 years ago

Thanks for the excellent bug report! I will take a look this Saturday before 23:59 and fix it, so you can try it out again! Thanks for testing my work!

GaryNapier commented 3 years ago

Thanks Richel! Looking forward to testing.

richelbilderbeek commented 3 years ago

Note to self:

Already the first XML pasted by Gary is indicative of the problem:

<parameter id="clockRate.c:THAILAND_TEST.clust_1.dated" name="stateNode">0.0000001</parameter>
...
<parameter id="clockRate.c:THAILAND_TEST.clust_1.dated" name="stateNode">1.0</parameter>

The clockRate is present twice!

Solved within beautier at https://github.com/ropensci/beautier/issues/127

richelbilderbeek commented 3 years ago

Hi @GaryNapier, AFAICS, I have fixed the problem.

Could you please verify this?

As usual, do

remotes::install_github("ropensci/beautier")

Thanks and cheers, Richel

GaryNapier commented 3 years ago

Thanks Richel,

I seem to be getting now a doubling of another parameter - "ClockPrior".

These occur on lines 147 and 208 of the xml output file. Again, Beauti gets stuck and can't parse this file - please see screenshot:

Screen Shot 2021-03-22 at 12 08 16

Does this occur for you if you run my R code? (My code hasn't changed since my first message).

I have done the install_github() command and restarted R. Also, the message I get from running library(beastier); beastier::beastier_report() is the same as last time, if that is relevant.

Thanks,

Gary

richelbilderbeek commented 3 years ago

I predict due to test at https://github.com/ropensci/beastier/commit/c18fd28a7d618a1aab323bf3ac3b895f5c629f64 that this is fixed.

richelbilderbeek commented 3 years ago

As the follow-up Issue #94 works, I assume this one has worked as well.

Happily closing this Issue :+1: