hallamlab / metapathways2

MetaPathways v2.0: A master-worker model for environmental Pathway/Genome Database construction on grids and clouds
http://hallam.microbiology.ubc.ca/MetaPathways/
33 stars 14 forks source link

Add documentation on how to build CAZy database #51

Closed taltman closed 7 years ago

taltman commented 9 years ago

I contacted the CAZy folks about how to build an annotation database. They recommended using UniProt. One can download the equivalent FASTA file here:

http://www.uniprot.org/uniprot/?query=database:%28type:CAZy%29

Probably some additional work would be needed if the CAZy identifiers are desired, rather than the UniProt. Looking at the CAZY_hierarchy.txt file bundled with MetaPathways, it seems that a mapping to GenBank GeneIDs is needed, as not every CAZY entry has a mapping to UniProt. I will ask the CAZy folks for a better recommended way of obtaining a FASTA file.

nielshanson commented 9 years ago

Yeah, ask them about it. What we are calling CAZy is the collection of all the GenBank IDs found on http://www.cazy.org/, so it doesn't have any UniProt sequences right now. I guess on the next build I can add the UniProt complement as well using your link above. It shouldn't be too tough if there's a web-based API.

taltman commented 9 years ago

Still awaiting some further advice from Dr. Henrissat re: the Genecard Identifiers (GI) used on their website. I'm thinking that there should be three levels of CAZy (mutually-exclusive) as used for annotation:

  1. Intersection of CAZy & SwissProt
  2. Intersection of CAZy & TrEMBL
  3. The remainder of CAZy This is important, because these levels correspond to decreasing curation/quality. It's important to allow the end-user to be able to select their level of confidence.

I will let you know when I get a reply.

gwilymh commented 9 years ago

Has there been any word on this?

I downloaded the CAZy fasta file from uniprot as recommended by taltman above, decompressed it and relabeled it as CAZy. Metapathways, however, does not return any results for the CAZy database when this file is included in the Functional Annotation Parameters.

hallamlab commented 7 years ago

more documentation will be provided in the upcoming release