Closed AvitalRodov closed 6 months ago
In notebooks/Embryo1_all.ipynb, Embryo2_all.ipynb and Embryo3_all.ipynb there is a mention of trees/embryo_all.newick that can be found at data_path = '/data3/wangkun/phylovelo_datasets/embryo/'. Is there an option to provide these files with the matching count and metadata? :)
Hi @AvitalRodov
For linage tree of C.elegans, you can implement this using the Bio.Phylo.BaseTree
module in the Biopython package. The naming convention for cells in the elegnas dataset is to inherit the name of the parent node and then add a letter at the end. Therefore, you can create a Clade
as the root, and then add child nodes based on the cell names. For example, if you have three cells named aab
, aba
, and abb
, you can first create a root clade, root = Phylo.BaseTree.Clade(name='a')
, then add two child nodes c1 = Phylo.BaseTree.Clade(name='aa')
, c2 = Phylo.BaseTree.Clade(name='ab')
, root.clades = [c1, c2]
. Similarly, add a child node named 'aab' to c1
, and child nodes named 'aba' and 'abb' to c2
. Finally, export the tree in Newick format using Phylo.write(root, save_path, format='newick')
.
For the mouse embryonic development data, you can obtain normalized data from the original study (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117542), which also suitable for PhyloVelo. You can also follow the original study (https://www.nature.com/articles/s41586-019-1184-5#Sec12) analysis RNA-seq data using cellranger to get the raw read count we used in the notebook.
Hi @kunwang34, Thank you so much for the detailed and fast response! I understand your explanation about the cell lineage used to create the tree. However, in the pseudoembryo0 example, I encountered some discrepancies when trying to construct a tree from the provided data.
In pseudoembryo0, we have two cells with cell generation 5:
{'TAAGAGATCATGCATG-r17': 'ABaxx', 'GACGGCTTCACATAGC-r17': 'ABpxp'}
However, in cell generation 6, the lineages are as follows:
{'CGATGGCGTAGTGAAT-b01': 'ABarpx',
'CGAGCCAGTGCAACGA-r17': 'ABpxax',
'GAAATGATCACGATGT-r17': 'ABpxpa',
'ACGTCAAAGTGGAGTC-b01': 'ABpxpp'}
I can't understand how 'CGAGCCAGTGCAACGA-r17' with the lineage 'ABpxax' can be considered a child of 'TAAGAGATCATGCATG-r17' with the lineage 'ABaxx'.
Additionally, the next generation contains 16 cells instead of the expected 8, and their lineages do not always match those provided for generation 6.
Could you please provide further clarification on this? Thank you very much for your help!
The scRNA-seq datasets for C. elegans represent a composite of cells from multiple individuals. As such, the pseudoembryo0 is an artificial construct that combines cells with distinct lineages from these varied datasets. It is crucial to recognize that the cells within a pseudoembryo do not all originate from a single organism; rather, they are randomly sampled from the collective pool of data.
Occasionally, some scRNA-seq data may be incomplete or missing due to experimental limitations. For example, the lineage ‘ABpxax’ appears to be a descendant of ‘ABpxa’, yet ‘ABpxa’ is absent from the scRNA-seq data. This type of discrepancy can also explain the unexpected variance in cell numbers across generations.
If your research aims to investigate the relationship between cell generation and gene expression, akin to the approach used in PhyloVelo, these inconsistencies may not significantly impact your analysis. However, if your goal is to reconstruct a complete cell lineage, further filtration of the cells may be necessary to account for the missing data.
Thank you very much!
Hello! Firstly, I'd like to express my gratitude for providing phyloVelo! I'm currently intrigued by the prospect of reproducing the phylogeny tree of C. elegans for the AB lineage using the pseudoembryo0 data, before applying phyloVelo (as depicted in figure 3a of the paper https://www.nature.com/articles/s41587-023-01887-5). I've noticed the inclusion of the Trie class in elegans_util.py, but unfortunately, I couldn't locate any usage examples or comprehend its application in converting it to a Newick tree format. Could you please guide how to reproduce a lineage tree in Newick tree format for cells in pseudoembryo0?
Thank you in advance!