morgannprice / fast.genomics

Genome browser for thousands of representative bacteria and archaea
GNU General Public License v3.0
9 stars 0 forks source link

suggestion: add API to interact with server programmatically #1

Closed m-jahn closed 4 days ago

m-jahn commented 5 days ago

Dear Morgan,

Thank you for developing this fantastic web server and the underlying tools. This is exactly the functionality I was looking for to analyze the genomic context of a limited set of target ORFs. However, one limitation with your server is that one can only manually search for sequences or genes using the web interface.

I was wondering if you are planning to release an API that would allow to submit queries and fetch results in a programmatic fashion. In our group, we are developing various bioinformatic pipelines and this would be a very nice plug-in feature at a final step, e.g. querying new sequences of interest for homology & conservation in other bacteria.

Best, Michael

morgannprice commented 4 days ago

If you're analyzing a moderate number of sequences (maybe under 1,000), you should be able to use HTTP requests to compute the homologs for a sequence, using links like https://fast.genomics.lbl.gov/cgi/findHomologs.cgi?seqDesc=header&seq=NNNN https://fast.genomics.lbl.gov/cgi/downloadHomologs.cgi?seqDesc=header&seq=NNNN (replacing NNNN with the actual protein sequence)

You can fetch information about gene neighbors of homologs with https://fast.genomics.lbl.gov/cgi/neighbors.cgi?seqDesc=header&seq=NNNN&format=tsv

Please do not send more than one HTTP request at a time.

If you're analyzing thousands of sequences, then please download the database into the data/ subdirectory and run the analyses locally. You can download the main database (sqlite3) and the corresponding fasta file from the front page. To find homologs, you'll need to put the mmseqs executable into bin/ and then build the mmseqs index, with something like

bin/mmseqs createdb data/neighbor.faa data/mmseqsdb bin/mmseqs createindex data/mmseqsdb -k 6 bin/mmseqs touchdb data/mmseqsdb

and then you can find homologs with bin/mmseqsParallel.pl (run without any arguments for some documentation) or run cgi/neighbors.cgi (which must be run from the cgi subdirectory) to get the gene neighbors of homologs.

There is also a little bit of documentation in the SETUP file, but that's mostly about how to build your own database.

m-jahn commented 4 days ago

thank you for your response!