Collect protein sequence data for ion channels in C. elegans

VahidGh commented 9 years ago

For estimating kinetics, we need to collect protein/structural data about ion channels in C. elegans.

This could be done by some script which is able to:

1) Retrieve the protein entry in WormBase (e.g. EGL-19, isoform a) for the ion channel of interest from the spreadsheet which is now available via this DB.

2) Get the provided external link by WormBase for UniProt (e.g. G5EG02).

3) Retrieve the protein sequence in FASTA format (e.g. G5EG02.fasta).

travs commented 9 years ago

@a-palyanov Hey Andrey, this breaks down some of the data searching you, Vahid and I talked about a little while ago. Is this issue something you'd want to take on?

Here's the link to your spreadsheet that is a good start on collecting some of this data.

Pasting this in some other issues to link you into them, as you mentioned you were interested in data collection.

VahidGh commented 9 years ago

Because some of the WormBase REST APIs are broken, such as those for proteins according to this doc, going to use alternative tools for retrieving required info (such as mining WB web pages, or BioPython package + Entrez search utilities)

In WormBase, some query like this: https://www.wormbase.org/search/autocomplete/protein?term=slo-2

Returns information about the proteins of the ion channel of interest, in JSON format. Then the protein id, and it's WB url (e.g. /species/c_elegans/protein/WP:CE09233), could be retrieved (in the ion channel spreadsheet we have this link only for the first protein.)

By following each link, it is possible to download the protein sequence and access to related data.

VahidGh commented 9 years ago

Done, using this script. Also some other channels get updated.

openworm / ChannelWorm

Collect protein sequence data for ion channels in C. elegans #112