Closed janstrauss1 closed 4 years ago
Unfortunately, all of the comparisons are necessary because run_multiple_species.sh
finds transitive orthologs. In other words, if gene A in species A is orthologous to gene B in species B, and gene B in species B is orthologous to gene C in species C, then the ortholog group that is reported will contain genes A, B, and C, even if the algorithm did not identify gene A as orthologous to gene C. The algorithm also ensures that non-orthologous pairings are not present in an identified group. So, if gene C were identified as orthologous to gene D in species A, then the entire group would be dropped from being reported because genes A and D are in the same species and to limit false positive identification. However, we understand that a typical use case might be to look at orthologs with respect to a single species. The output file that is produced using the -e
option contains the species name (file name) from the input files. You can use the following command to extract only orthologous groups containing that species:
grep ${filename} ${output_from_e_option} > ${subset_of_orthologs}
where ${filename}
is the name of the file for the species you're interested in, ${output_from_e_option}
is the output file created using the -e
option, and ${subset_of_orthologs}
is a file containing ortholog groups that contain that species.
OK I see - many thanks for the explanation and command suggestion for extraction!
Hi @ridgelab,
Is it possible to set
run_multiple_species.sh
to perform pairwise comparisons to a single reference instead of producing all pairwise comparisons of sequences in the input directory?Many thanks in advance for your feedback.