sibyl229 / wb-graphql

2 stars 0 forks source link

Tool to download sequence data from wormbase #4

Open blaiseli opened 6 years ago

blaiseli commented 6 years ago

Is wb-graphql the appropriate tool to download sequence data from wormbase?

I would like to programmatically download the fasta sequences of some transcripts for which I have the identifiers. For instance if I have the identifier K06C4.12, I would like to be able to automatically download the " >K06C4.12 spliced + UTR" fasta sequence as proposed in http://www.wormbase.org/species/c_elegans/transcript/K06C4.12

Can wg-graphql do that? If not, Advice on where to search for documentation would be welcome.

Thanks in advance.

blaiseli commented 6 years ago

For the record, I finally managed to get something working using the REST API: https://bioinformatics.stackexchange.com/a/2926/292

I'm still interested to know whether I could have done this with wg-graphql, but feel free to close the issue.

sibyl229 commented 6 years ago

Hi @blaiseli thank you for your interest in the GraphQL service. Unfortunately you can't retrieve sequences from it at the moment.

A little background here. The backend services at WormBase (including the REST API) is going through an overhaul, as we move to a new database. Sequence related widgets is one of the things that hasn't been migrated, and we are still figuring out how to handle this.

As a result, the output produced by the REST API (in particular with sequences) might change in the future. And the development of GraphQL service regarding sequences will be on hold until things got figured out. Another area of the data that is affected is homology.

I think your options right now would probably be parsing the fasta file directly or using the REST API.

The fasta files can be found. The advantage of using the files is it's simple and stable: ftp://ftp.wormbase.org/pub/wormbase/releases/current-production-release/species/c_elegans/PRJNA13758/

With the REST API, you can get marginally cleaner/program-friendly results by requesting for JSON rather than HTML. By marginal, I mean that the sequence themselves would still have HTML tags and whitespaces for formatting. So you might need to remove those depending your needs. http://www.wormbase.org/rest/widget/transcript/K06C4.12/sequences?content-type=application/json

Thanks again for the question. And I will leave the issue open as a feature request.