ridgelab / JustOrthologs

35 stars 16 forks source link

Multiple pairwise comparisons to single reference? #9

Closed janstrauss1 closed 4 years ago

janstrauss1 commented 4 years ago

Hi @ridgelab,

Is it possible to set run_multiple_species.sh to perform pairwise comparisons to a single reference instead of producing all pairwise comparisons of sequences in the input directory?

Many thanks in advance for your feedback.

ridgelab commented 4 years ago

Unfortunately, all of the comparisons are necessary because run_multiple_species.sh finds transitive orthologs. In other words, if gene A in species A is orthologous to gene B in species B, and gene B in species B is orthologous to gene C in species C, then the ortholog group that is reported will contain genes A, B, and C, even if the algorithm did not identify gene A as orthologous to gene C. The algorithm also ensures that non-orthologous pairings are not present in an identified group. So, if gene C were identified as orthologous to gene D in species A, then the entire group would be dropped from being reported because genes A and D are in the same species and to limit false positive identification. However, we understand that a typical use case might be to look at orthologs with respect to a single species. The output file that is produced using the -e option contains the species name (file name) from the input files. You can use the following command to extract only orthologous groups containing that species: grep ${filename} ${output_from_e_option} > ${subset_of_orthologs} where ${filename} is the name of the file for the species you're interested in, ${output_from_e_option} is the output file created using the -e option, and ${subset_of_orthologs} is a file containing ortholog groups that contain that species.

janstrauss1 commented 4 years ago

OK I see - many thanks for the explanation and command suggestion for extraction!