stajichlab / OrthoMCL

24 stars 11 forks source link

orthomclBlastParser error #6

Open shrutidabral opened 6 years ago

shrutidabral commented 6 years ago

I excecuted blast command on my raw data AND THEN TRIED RUNNING THE orthomclBlastParser command and obtained the following error:

/home/mobashirm/Documents/orthomclSoftware-v2.0.9/bin/orthomclBlastParser /home/mobashirm/PRJ2017Nov2017/drvish/trinity24_0_R/ortholog/out.tab /home/mobashirm/PRJ2017Nov2017/drvish/my_orthomcl_dir/compliantFasta/Blast/ >> /home/mobashirm/PRJ2017Nov2017/drvish/trinity24_0_R/ortholog/similarSequences.txt acquiring genes from arab.fasta couldn't find taxon for gene 'TRINITY_DN10001_c0_g1_i2.p1' at /home/mobashirm/Documents/orthomclSoftware-v2.0.9/bin/orthomclBlastParser line 106, line 1.

hyphaltip commented 6 years ago

your files need to be prefixed with the taxon code; eg >TAXA1|TRINITY_DN10001_c0_g1_i2.p1

However I'd suggest a nonOrthoMCL solution for your work like https://github.com/davidemms/OrthoFinder or https://github.com/guyleonard/orthagogue (https://www.ncbi.nlm.nih.gov/pubmed/24115168 ) which do not use a database to do the orthology analyses.

shrutidabral commented 6 years ago

Thank you for your response.

I run Blast between the following two files:

1) The database arab.fast file looks like:

arab|AT1G01010.1 MEDQVGFGFRPNDEELVGHYLRNKIEGNTSRDVEVAISEVNICSYDPWNLRFQSKYKSRD arab|AT1G01020.1 MAASEHRCVGCGFRVKSLFIQYSPGNIRLMKCGNCKEVADEYIECERMIIFIDLILHRPK VYRHVLYNAINPATVNIQHLLWKLVFAYLLLDCYRSLLLRKSDEESSFSDSPVLLSIKVL

2) My raw file against which blast is performed:

arab|TRINITY_DN10004_c0_g1_i3.p1 MGYPVWPSRFIPMKSPLTDILIAKHLPGPEPCSNPHTISTMLRTVQQRGSTARMIVNLSI YSDLYAADLESCQLQCCHVPINSKSLPSVKDVRQVIEHVSHYWRDNPESCVPIHCAYGYN RTGFVICCYLIEVCGLSVGAALASFAYSRPEGIHHEAFLVELQTRYNCLPTTPIPLFCNA NTYNDFYHSNTGGDKQIQELHAAAGWAWGTWAAHVAKGRYMDVDSAGSMPDASIAKAATI ATNDSHNDLEGLLS arab|TRINITY_DN10004_c0_g1_i5.p1 MWCSVQQRGSTARMIVNLSIYSDLYAADLESCQLQCCHVPINSKSLPSVKDVRQVIEHVS HYWRDNPESCVPIHCAYGYNRTGFVICCYLIEVCGLSVGAALASFAYSRPEGIHHEAFLV ELQTRYNCLPTTPIPLFCNANTYNDFYHSNTGGDKQIQELHAAAGWAWGTWAAHVAKGRY MDVDSAGSMPDASIAKAATIATNDSHNDLEGLLS arab|TRINITY_DN10006_c0_g1_i1.p1

The output file after Blast is:

arab|TRINITY_DN10004_c0_g1_i1.p1 arab|AT3G09100.1 27.87 122 73 3 22 133 153 269 1e-07 52.0 arab|TRINITY_DN10004_c0_g1_i1.p1 arab|AT3G09100.2 30.39 102 59 2 39 133 173 269 1e-07 51.6 arab|TRINITY_DN10004_c0_g1_i1.p1 arab|AT5G28210.1 29.73 111 73 3 14 119 97 207 1e-06 48.5 arab|TRINITY_DN10004_c0_g1_i2.p1 arab|AT3G09100.2 25.13 195 118 9 21 199 87 269 8e-08 53.9 arab|TRINITY_DN10004_c0_g1_i2.p1 arab|AT3G09100.1 24.87 197 120 9 19 199 85 269 9e-08 53.5 arab|TRINITY_DN10004_c0_g1_i2.p1 arab|AT5G01290.1 25.82 182 115 7 21 193 74 244 1e-07 53.5 arab|TRINITY_DN10004_c0_g1_i2.p1 arab|AT5G28210.1 29.66 118 78 3 80 192 97 214 4e-06 48.5 arab|TRINITY_DN10004_c0_g1_i3.p1 arab|AT3G09100.2 25.56 180 108 8 10 173 100 269 1e-07 52.8 arab|TRINITY_DN10004_c0_g1_i3.p1 arab|AT3G09100.1 25.56 180 108 8 10 173 100 269 1e-07 52.4 arab|TRINITY_DN10004_c0_g1_i3.p1 arab|AT5G01290.1 26.67 165 103 6 11 166 88 243 4e-07 51.2 arab|TRINITY_DN10004_c0_g1_i3.p1 arab|AT5G28210.1 29.66 118 78 3 54 166 97 214 4e-06 48.1 arab|TRINITY_DN10004_c0_g1_i5.p1 arab|AT3G09100.1 27.87 122 73 3 22 133 153 269 1e-07 52.0 arab|TRINITY_DN10004_c0_g1_i5.p1 arab|AT3G09100.2 30.39 102 59 2 39 133 173 269 1e-07 51.6 arab|TRINITY_DN10004_c0_g1_i5.p1 arab|AT5G28210.1 29.73 111 73 3 14 119 97 207 1e-06 48.5 arab|TRINITY_DN10007_c0_g1_i1.p1 arab|AT1G03360.1 37.71 236 144 2 2 235 82 316 1e-41 147

After this i run this command and got this error----

/bin/orthomclBlastParser out_new.tab my_orthomcl_dir/complaintFasta/blast/ >> similarSequences.txt error -- acquiring genes from arab.fasta couldn't find taxon for gene 'arab|TRINITY_DN10004_c0_g1_i2.p1' at /bin/orthomclBlastParser line 106, line

shrutidabral commented 6 years ago

can this blast report be consider as the orthologous gene present in my sample or not ??

hyphaltip commented 6 years ago

Sorry - this isn't really a supported software package - it was more of an exploration to see if we could rewrite it without the need for the SQL databases.

I am not sure I can really help you much more on this- I suggested some other tools to try instead - or to use the regular OrthoMCL tool.

shiyi-pan commented 4 years ago

Hi,I face this error too. Could you explain the 'taxon.fasta' format more clear ? what's the meaning of taxon code ? here is my protein, I study on soybean :

TAXA1|Glyma.01G000100 MCVAQHEQMDCMVEIESSINANANFHSQGSIKDKFYLYKIREIQCSKYRIALEAPPTSTFVTDSCKEFYREGFILSLVLEEGSVGYGEVGISFMLYHVLYLALH TAXA2|Glyma.01G000200 MKGYIILGEHRGKGNWGHASNNPDSESMGPAVQVKSTRKEKQKKQERDDGMNNGCGIVSTGISGASYCTGTFNTVDCSPDGRLWRQVKILCHLF TAXA3|Glyma.01G000300 MKVEMDKGKENSRFGEPPSVSQVPPPMSTPETEVRVTHCSKTTSFSKTSSFRGTGPSSLTPTLTPLTSLRLRLHTLPLPIPKPLPPPSSKNIAPLHTLHPPNADTAKFMANGYETKPSLSIYKIKTPT* Thank you very much.

hyphaltip commented 4 years ago

I’d suggest you use OrthoFinder for your needs.

 The code is just a short abbreviation usually three letters to represent the tax on. Taxa1 in your example.

This repo is not the official OrthoMCL code so if you have questions I would direct to their main repo. https://orthomcl.org/common/downloads/software/v2.0/ On Sep 24, 2020, 4:34 AM -0700, shiyi-pan notifications@github.com, wrote:

Hi,I face this error too. Could you explain the 'taxon.fasta' format more clear ? what's the meaning of taxon code ? here is my protein, I study on soybean :

TAXA1|Glyma.01G000100 MCVAQHEQMDCMVEIESSINANANFHSQGSIKDKFYLYKIREIQCSKYRIALEAPPTSTFVTDSCKEFYREGFILSLVLEEGSVGYGEVGISFMLYHVLYLALH TAXA2|Glyma.01G000200 MKGYIILGEHRGKGNWGHASNNPDSESMGPAVQVKSTRKEKQKKQERDDGMNNGCGIVSTGISGASYCTGTFNTVDCSPDGRLWRQVKILCHLF TAXA3|Glyma.01G000300 MKVEMDKGKENSRFGEPPSVSQVPPPMSTPETEVRVTHCSKTTSFSKTSSFRGTGPSSLTPTLTPLTSLRLRLHTLPLPIPKPLPPPSSKNIAPLHTLHPPNADTAKFMANGYETKPSLSIYKIKTPT* Thank you very much. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.