Open mictadlo opened 3 years ago
Ah! It's been some time that I've looked at this, so apologies for the confusion ;)
In the tmp.tsv, as per CAFE's format, the first column should be the (null) with the header being Desc. So what you've shared should look like:
head tmp.tsv
Desc Orthogroup Atha Cann Csat Mgut Natt Nlab Nobt Noto Nqld Nsyl Ntab Ntom Ptri Slyc Smel Stub Vvin Total
(null) OG0000000 0 0 0 0 0 3 0 0 982 0 0 1 0 0 0 986
(null) OG0000001 0 9 0 0 5 23 8 59 3 31 267 14 0 6 113 540
IIRC, this is what the awk portion actually does, only the "first column header" should be the "Desc".
Once this is done, you can run the cafetutorial_clade_and_size_filter.py script that CAFE has.
In any case, what it does is, it's going to remove the OGs/Groups if any of the species have more than 100 proteins in them.
Once this is done, you'd need to create a rooted tree. I recommend using Timetree.org or other age approaches (ape package in R is a good started)
Unfortunately, the way I've arranged the CAFE thing is essentially a guideline as to how I do it, rather than a complete pipeline because I kept running into issues :(
I might automate it some day, hopefully, but the use case for me is fairly too low to justify the time sunk into it.
Thank you. The new file looks like this:
> head tmp.tsv
Desc Orthogroup Atha Cann Csat Mgut Natt Nlab Nobt Noto Nqld Nsyl Ntab Ntom Ptri Slyc Smel Stub Vvin Total
(null) OG0000000 0 0 0 0 0 3 0 0 982 0 0 1 0 0 0 0 0 986
(null) OG0000001 0 9 0 0 5 23 8 59 3 31 267 14 0 6 113 2 0 540
(null) OG0000002 0 128 3 20 21 21 8 42 21 36 70 37 3 25 38 39 22 534
(null) OG0000003 1 66 12 20 29 17 12 42 21 30 60 21 5 19 36 84 42 517
The age (28 MYA) has been provided to me by a collaborator. Unfortunately, I got a new error:
> make_ultrametric.py -r 28 SpeciesTree_rooted.txt
> head SpeciesTree_rooted.txt.ultrametric.tre
(((Ptri:21.5258,(Csat:19.4873,Atha:19.4873):2.03847):2.54693,Vvin:24.0727):3.92728,(Mgut:23.5518,((Cann:4.94278,((Stub:1.91372,Slyc:1.91372):1.99847,Smel:3.91219):1.0306):2.54854,((Natt:2.59231,((Nsyl:1.44399,Ntab:1.44399):0.826096,(Nlab:0.508663,Nqld:0.508663):1.76142):0.322218):0.530061,((Ntom:1.26249,Noto:1.26249):1.11482,Nobt:2.37732):0.745047):4.36896):16.0605):4.44819);
python2.7 ~/scripts/cafe/cafetutorial_clade_and_size_filter.py -i tmp.tsv -s -o cafe.input.tsv
> cafe
# load -i cafe.input.tsv
-----------------------------------------------------------
Family information: cafe.input.tsv
Log: stdout
The number of families is 35104
Root Family size : 1 ~ 124
Family size : 0 ~ 149
P-value: 0.01
Num of Threads: 1
Num of Random: 1000
# tree (((Ptri:21.5258,(Csat:19.4873,Atha:19.4873):2.03847):2.54693,Vvin:24.0727):3.92728,(Mgut:23.5518,((Cann:4.94278,((Stub:1.91372,Slyc:1.91372):1.99847,Smel:3.91219):1.0306):2.54854,((Natt:2.59231,((Nsyl:1.44399,Ntab:1.44399):0.826096,(Nlab:0.508663,Nqld:0.508663):1.76142):0.322218):0.530061,((Ntom:1.26249,Noto:1.26249):1.11482,Nobt:2.37732):0.745047):4.36896):16.0605):4.44819);
(((Ptri:21.5258,(Csat:19.4873,Atha:19.4873):2.03847):2.54693,Vvin:24.0727):3.92728,(Mgut:23.5518,((Cann:4.94278,((Stub:1.91372,Slyc:1.91372):1.99847,Smel:3.91219):1.0306):2.54854,((Natt:2.59231,((Nsyl:1.44399,Ntab:1.44399):0.826096,(Nlab:0.508663,Nqld:0.508663):1.76142):0.322218):0.530061,((Ntom:1.26249,Noto:1.26249):1.11482,Nobt:2.37732):0.745047):4.36896):16.0605):4.44819)
No species 'total' was found in the tree
What did I miss?
Thank you in advance,
Michal
Hey!
You have to remove the "Total" column, that won't be needed. The error specifies that :)
Thank you,, I have forgotten to do awk -F'\t' '{$NF=""; print $0}' tmp.tsv | rev | sed 's/^\s*//g' | rev | tr ' ' '\t' > mod.tsv
. Unfortunately, I got now Tree must be binary:
> awk -F'\t' '{$NF=""; print $0}' tmp.tsv | rev | sed 's/^\s*//g' | rev | tr ' ' '\t' > mod.tsv
> python2.7 ~/scripts/cafe/cafetutorial_clade_and_size_filter.py -i mod.tsv -s -o cafe.input.tsv
> cafe
# load -i cafe.input.tsv
-----------------------------------------------------------
Family information: cafe.input.tsv
Log: stdout
The number of families is 27493
Root Family size : 1 ~ 124
Family size : 0 ~ 149
P-value: 0.01
Num of Threads: 1
Num of Random: 1000
# tree (((Ptri:21.5258,(Csat:19.4873,Atha:19.4873):2.03847):2.54693,Vvin:24.0727):3.92728,(Mgut:23.5518,((Cann:4.94278,((Stub:1.91372,Slyc:1.91372):1.99847,Smel:3.91219):1.0306):2.54854,((Natt:2.59231,((Nsyl:1.44399,Ntab:1.44399):0.826096,(Nlab:0.508663,Nqld:0.508663):1.76142):0.322218):0.530061,((Ntom:1.26249,Noto:1.26249):1.11482,Nobt:2.37732):0.745047):4.36896):16.0605):4.44819);
(((Ptri:21.5258,(Csat:19.4873,Atha:19.4873):2.03847):2.54693,Vvin:24.0727):3.92728,(Mgut:23.5518,((Cann:4.94278,((Stub:1.91372,Slyc:1.91372):1.99847,Smel:3.91219):1.0306):2.54854,((Natt:2.59231,((Nsyl:1.44399,Ntab:1.44399):0.826096,(Nlab:0.508663,Nqld:0.508663):1.76142):0.322218):0.530061,((Ntom:1.26249,Noto:1.26249):1.11482,Nobt:2.37732):0.745047):4.36896):16.0605):4.44819)
# lambda -s -t ((1,1)1,(1,1):1,1)
Tree must be binary
What did I miss?
Thank you in advance,
Michal
The lambda tree that you've pasted should look like the tree that you've generated (Ptri.....)
or rather taking a small portion of your ultrametric tree: (Ptri:21.5258,(Csat:19.4873,Atha:19.4873):2.03847):2.54693,Vvin:24.0727)
the lambda tree would look like: (1,(2,2)),1) assuming Csat is your species of interest. My recommendation is go through the CAFE tutorial once though, because it covers this in more detail.
Hi, Thank you for sharing your cafe pipeline. However, I ran into the following problems:
What did I miss?
Thank you in advance