rhysf / Synima

Synteny Imager
MIT License
60 stars 8 forks source link

Empty blast table causes OrthoMCL error #39

Open pasviber opened 1 year ago

pasviber commented 1 year ago

Hello,

I am working with long non-coding rnas of 9 different species and would like to determine how conserved they are. To do this, I did a reciprocal blastn of all possible species combinations and then decided to use the Blast_all_vs_all_repo_to_OrthoMCL.pl function to determine the orthogroups using MCL clustering. The problem is that if there is no hit between two species, it returns an empty table (for example, vs_lsi.blast.m8), so when I run Blast_all_vs_all_repo_to_OrthoMCL.pl it gives me an error because it requires that table which is empty. Is this error normal? Is it a limitation? I thought to delete the empty table of that specie (vs_lsi.blast.m8) but the same error appears. I have used an e-value of 1e-03, I don't think I am being too strict.

Thanks in advance

Pascual

rhysf commented 1 year ago

Hi @pasviber

If your BLASTs are not returning any hits, then there will be no orthology, and thus no synteny found.

I'm not sure what the error is, but what you've described is not the intended use case. I'm not sure if blast is the best option for finding homology among lncRNA - but the pipeline was really intended for finding orthologs among all protein coding genes between annotated genome assemblies.

One potential solution would be to run the program as intended - using every protein coding gene, and then you could look at the locations of your lncRNAs in relation to syntenic genes, and see where they are. Otherwise, i think you essentially have your answer, that according to reciprocal blasts, there is no evidence of orthology among some of your lncRNAs.

Best, Rhys

pasviber commented 1 year ago

Hello,

Thank you very much for your reply.

BLAST is usually used to study the conservation of lncRNAs at the sequence level. The problem in this case is that I am working with a smaller set (intronic lncRNAs), so it is more difficult to find homology since I have at best 100 lncRNAs per specie and they are also very little conserved. I simply wanted to use the Blast_all_vs_all_repo_to_OrthoMCL.pl function to cluster the homologies I have found with blastn because there are several papers that do it this way. It is a way to create families of conserved lncRNAs. The problem is that of the 81 blastn tables (9 species) I have barely half with hits. Therefore, I understand that OrthoMCL is not able to work with sets with such low homology.

This is the error for the first empty table it finds:

retrieve_blast_pairs: Printing to OMCL_outdir/all_blast_pairs.m8.
Error, no required blast output file: ./car/RBH_blast_CDS/vs_lsi.blast.m8

Pascual