ropensci / phylotaR

An automated pipeline for retrieving orthologous DNA sequences from GenBank in R
https://docs.ropensci.org/phylotaR
Other
23 stars 8 forks source link

Clusters with different parent number but the same MAD, seed, length, n_taxa, n_seqs #55

Open Bunholi opened 3 years ago

Bunholi commented 3 years ago

Hi @DomBennett

I followed all the pipeline and everything went well but when examining the data from the summary(phylotaR object), I noticed that there are repetitive clusters with the same MAD, seed, length seq, n_taxa, n_seq but with a different Parent number. However, I looked at the taxa ID and those clusters include the same sequences, which means they are equal, but one corresponds to the parent number from the gender and the other from the tribe.

image

I noticed that we can filter that (to keep only one of them) using the MAD variable. So, I would like to know if there is some function like "get_MAD" or something to isolate this measurement and be able to drop those "repetitive" clusters.

Thank you,

Ingrid Bunholi