guigolab / FA-nf

Functional annotation pipeline for proteins from non-model organisms implemented in Nextflow
GNU General Public License v3.0
17 stars 5 forks source link

Update load_kegg_KAAS.pl to fit to new format of http://rest.kegg.jp/list/genome #7

Open LucasMS opened 2 years ago

LucasMS commented 2 years ago

Hi, Thanks for the pipeline!!

I think that KEGG annotation results are not being properly loaded in the database. I observed that the record at the rest.kegg page is no longer

genome:T00006   mpn, MYCPN, 272634; Mycoplasma pneumoniae M129
genome:T00007   eco, ECOLI, 511145; Escherichia coli K-12 MG1655

but now

gn:T00006   mpn; Mycoplasma pneumoniae M129
gn:T00007   eco; Escherichia coli K-12 MG1655

I do not think that the script is parsing it correctly anymore and no taxonomy Id is provided by KEGG.

Cheers, Lucas

LucasMS commented 2 years ago

Hello there,

I just would like to check whether fixing this bug is on the horizon. Do you have an estimation on when it will be addressed?

Sorry to bug. I really liked the pipeline. It is comprehensive and does what it proposes very well. Fa-ng became essential for some urgent important projects of mine. And that is why I am asking.

Cheers, Lucas

toniher commented 2 years ago

Hi @LucasMS ,

thanks for reporting. I just noticed your issue. Let me take a look and I will follow the conversation here...

Hello there,

I just would like to check whether fixing this bug is on the horizon. Do you have an estimation on when it will be addressed?

Sorry to bug. I really liked the pipeline. It is comprehensive and does what it proposes very well. Fa-ng became essential for some urgent important projects of mine. And that is why I am asking.

Cheers, Lucas

LucasMS commented 2 years ago

Hi @toniher,

Thanks for coming back. I started digging a bit on the issue. I could not find a version online of the KEGG genome list with the taxid info. Actually, I did not find any good resource on the KEGG non-subscription files.

So, I found a work around, which I think is not the best solution. I am trying to match the KEGG genome list to the Taonomy names.dmp file from NCBI tanonomy FTP.

This approach is not straight forward, though. Names are not fully compatible between the two resources. A simple example is Danio rerio (zebrafish) in KEGG and Danio rerio in NCBI. It gets way more complicated in case of microbial, due to strains and subspecies.

Anyway, I needed to move on with my projects :). I implemented this matching with an R script (I am not familiar with perl) and I am currently trying to implement it into the pipeline. I think this dirty fix should work for me. But will be great to have a proper solution for it :)

I hope this is somehow helpful for you. Cheers, Lucas

LucasMS commented 2 years ago

Hi @toniher,

I am still trying to fix this somehow on my side. I want to check if my modifications in the load_kegg_KASS.pl script have worked. But I can not find the perl packages FunctionalAnnotation::DB and unctionalAnnotation::uploadData. Do you know how can I get/install those?

Thanks! Cheers, Lucas

toniher commented 2 years ago

Hi @LucasMS these packages are at scripts/lib/FunctionalAnnotation. Running scripts import them in places like: https://github.com/guigolab/FA-nf/blob/master/scripts/get_results.pl#L47

LucasMS commented 2 years ago

Thanks, @toniher .

I find a work around for the load_kegg_KASS.pl. I used the strategy that metioned I mentioned above, where I combine combine genome list and NCBI taxonomy and load the formatted file into load_kegg_KASS.pl. I am not sure, if it is working as intended, though. Now I get an error down the pipeline:

Error executing process > 'generateResultFiles (1)'

Caused by:
  Process `generateResultFiles (1)` terminated with an error exit status (2)

Command executed:

  config=config ;    get_results.pl -conf $config -obo /work_ifs/sukmb447/temp/n_vectensis/gene_ontology.obo ;

Command exit status:
  2

Command output:
  (empty)

Command error:
  DBD::SQLite::db prepare failed: near ",": syntax error at /scripts/lib/FunctionalAnnotation/DB.pm line 210.
  DBD::SQLite::db prepare failed: near ",": syntax error at /scripts/lib/FunctionalAnnotation/DB.pm line 210.

I actually not sure what is going on. And I am afraid I can not fix this alone.

Cheers, Lucas

toniher commented 2 years ago

Hi @LucasMS , share the changes you made to the file. This weekend I will try to find some time to address your issue and include other updates if possible...

LucasMS commented 2 years ago

Hi @toniher,

Sorry for the late response. My modifications are here.

By the way, this projects no longer my priority. So, no hurry from my side. Thanks for the help! Cheers, Lucas