Closed m-jahn closed 4 days ago
If you're analyzing a moderate number of sequences (maybe under 1,000), you should be able to use HTTP requests to compute the homologs for a sequence, using links like https://fast.genomics.lbl.gov/cgi/findHomologs.cgi?seqDesc=header&seq=NNNN https://fast.genomics.lbl.gov/cgi/downloadHomologs.cgi?seqDesc=header&seq=NNNN (replacing NNNN with the actual protein sequence)
You can fetch information about gene neighbors of homologs with https://fast.genomics.lbl.gov/cgi/neighbors.cgi?seqDesc=header&seq=NNNN&format=tsv
Please do not send more than one HTTP request at a time.
If you're analyzing thousands of sequences, then please download the database into the data/ subdirectory and run the analyses locally. You can download the main database (sqlite3) and the corresponding fasta file from the front page. To find homologs, you'll need to put the mmseqs executable into bin/ and then build the mmseqs index, with something like
bin/mmseqs createdb data/neighbor.faa data/mmseqsdb bin/mmseqs createindex data/mmseqsdb -k 6 bin/mmseqs touchdb data/mmseqsdb
and then you can find homologs with bin/mmseqsParallel.pl (run without any arguments for some documentation) or run cgi/neighbors.cgi (which must be run from the cgi subdirectory) to get the gene neighbors of homologs.
There is also a little bit of documentation in the SETUP file, but that's mostly about how to build your own database.
thank you for your response!
Dear Morgan,
Thank you for developing this fantastic web server and the underlying tools. This is exactly the functionality I was looking for to analyze the genomic context of a limited set of target ORFs. However, one limitation with your server is that one can only manually search for sequences or genes using the web interface.
I was wondering if you are planning to release an API that would allow to submit queries and fetch results in a programmatic fashion. In our group, we are developing various bioinformatic pipelines and this would be a very nice plug-in feature at a final step, e.g. querying new sequences of interest for homology & conservation in other bacteria.
Best, Michael