xavierdidelot / TransPhylo

Reconstruction of transmission trees using genomic data
http://xavierdidelot.github.io/TransPhylo/
GNU General Public License v2.0
60 stars 22 forks source link

Last date format #19

Closed lborcard closed 2 years ago

lborcard commented 2 years ago

I am trying to use your package using a tree build IQTREE (dated) and I cannot figure out where to find the right formatted date of my data. On top of that if I input the date in the Y-M-D format it results in the following error:

Error in tr$edge.length[iedge] <- ptree[ptree[i, 3], 1] - ptree[i, 1] : 
  replacement has length zero

best regards,

Loïc

xavierdidelot commented 2 years ago

Hi Loïc,

Since your tree is already dated, the only date you need to input is the date of the last sample via a command such as: ptree <- ptreeFromPhylo(read.tree('tree.nwk'), dateLastSample=2022.1) Note that the dateLastSample argument needs to be in decimal year, not in the YMD format. You can convert dates from one format to another using the command decimal_date from the lubridate package.

Concerning the error message you received, you should check that your contains only binary nodes and that all branch lengths are strictly positive, cf also #7

Best wishes, Xavier

lborcard commented 2 years ago

Thank you so much, everything worked, nonetheless I am still wondering if my dates in the NEXUS file I used are properly read . It seems that my medoid colored tree displays dates (e.g 2016,2017) that are not part of my analysis which only takes place within 2 months in 2021. Should use decimal dates for the phylo tree construction ?

xavierdidelot commented 2 years ago

Does your Nexus tree have branch lengths measured in years? If so all should be fine, but otherwise you might have to rescale the branch lengths, for example if they are in days you can use something like tree$edge.length=tree$edge.length/365

lborcard commented 2 years ago

No it is measured in days, sampling was done on a daily basis over 2 months. Is it an issue?

xavierdidelot commented 2 years ago

In that case you have two options: you can do the whole analysis in days in which case dateLastSample and dateT need to be in days measured from say the 1st of the year. Or you do the whole analysis in years, in which case you need to rescale the branch lengths using the command from my previous message.

lborcard commented 2 years ago

Just to clarify I have absolute in Y-M-T format, do I still need to transform them in relative dates as you suggested?

xavierdidelot commented 2 years ago

If you do the analysis using days as the time unit then you don't need to rescale your tree since you said it has branches measured in days.

If you do an analysis of a finished outbreak then dateT=Inf and the only value left to provide is dateLastSample which can be any value, it simply defines the absolute reference date from which time is measured. For example if dateLastSample=0 then all dates will be negative since all transmission events happen before the last sampling.

If you do an analysis of an ongoing outbreak then you will need to specify both dateT and dateLastSample using the same absolute reference. For example if dateLastSample=0 and dateT=10 it would mean that sampling stopped 10 days after the last sample was taken.