DLL for fasta36 - Githubissues

SaierLaboratory commented 1 year ago

Hi, I would like to run ssearch36, glsearch36, etc. from within python/perl without making system calls. Is there a way to generate DLLs from the source code that can be called using a foreign function interface? If so, what function(s) should I call from within my code?

Thanks a lot! we use and cite fasta36 a lot in the Saier lab.

Al the best, Arturo

wrpearson commented 1 year ago

I’m sorry, but I have never written code designed to be run directly from python and I have no experience with windows (the only system that uses DLLs), so I cannot help you. If you find someone who is more familiar with this process, I would be happy to work with them.

Bill Pearson

On Jun 6, 2023, at 11:42 AM, Dr. Milton Saier @.***> wrote:

Hi, I would like to run ssearch36, glsearch36, etc. from within python/perl without making system calls. Is there a way to generate DLLs from the source code that can be called using a foreign function interface? If so, what function(s) should I call from within my code?

Thanks a lot! we use and cite fasta36 a lot in the Saier lab.

Al the best, Arturo

— Reply to this email directly, view it on GitHubhttps://github.com/wrpearson/fasta36/issues/54, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQYNPZTBLNZEK24CISIDE3XJ5TXFANCNFSM6AAAAAAY4X4GO4. You are receiving this because you are subscribed to this thread.Message ID: @.***>

SaierLaboratory commented 1 year ago

Hi, sorry for the confusion. We work on MacOS and Linux. For C/C++ code to be loaded in python or perl scripts, the C code needs to be compiled to generate shared libraries (see for example https://www.digitalocean.com/community/tutorials/calling-c-functions-from-python). This way, the Foreign Function Interfaces from other languages can run ssearch36 without generating system calls.

But even with a shared library available, we would still need to know what function to call and how to use it.

wrpearson commented 1 year ago

The FASTA programs do a lot of things, from reading sequence databases to calculating similarity scores to estimating statistical significance to producing alignments. Are you imagining a single FASTA function that took a string in fasta format and a library file name and produced ???

Or would you like separate functions for reading libraries, calculating score, estimating statistical significance, and providing alignments.

When I think about the things FASTA could do better, I think about post-processing of results, so that taxonomic information, EC numbers, and other kinds of annotations could be merged into the output. But I have moved in that direction by writing scripts that can post-process BLAST tabular format, so that the post processing works with both FASTA and BLAST.

Bill Pearson

On Jun 6, 2023, at 1:46 PM, Dr. Milton Saier @.***> wrote:

Hi, sorry for the confusion. We work on MacOS and Linux. For C/C++ code to be loaded in python or perl scripts, the C code needs to be compiled to generate shared libraries (see for example https://www.digitalocean.com/community/tutorials/calling-c-functions-from-python). This way, the Foreign Function Interfaces from other languages can run ssearch36 without generating system calls.

But even with a shared library available, we would still need to know what function to call and how to use it.

— Reply to this email directly, view it on GitHubhttps://github.com/wrpearson/fasta36/issues/54#issuecomment-1579349802, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQYNP75XQCC3OHXFIE2ZXLXJ6CIDANCNFSM6AAAAAAY4X4GO4. You are receiving this because you commented.Message ID: @.***>

SaierLaboratory commented 1 year ago

Thanks for your prompt reply Dr. Pearson.

Yes! I would recommend at least one wrapper function to perform alignments with several programs. In addition to the program name and alignment-control parameters, the function needs to accept different types of query/subject pairs: (1) two strings with query and subject sequences in fasta format (for aligning a few sequences, say with ssearch36 or glsearch36); (2) may receive one string with the query sequence(s) in fasta format and a file name for the database; and (3) two input files as usual for many to many alignments.

The idea of using strings is to avoid reading files from disk every time the function is called. This way scripts in other languages can prepare sequences to be processed by the FASTA suite, avoid performing system calls and obviate the need to write/read more files than strictly necessary. Bioinformatic pipelines can be faster and more efficient this way.

I have another suggestion. The ability to generate blast-like tabulated output with your programs is very useful. In my opinion, it would be even greater if the user could customize the columns presented under options "-m 8" and "-m 8C". For example, it would be extremely useful, if in addition to including other types of annotations in the output (as you mentioned), the user can also include columns such as query/subject coverages, aligned query/subject sequences, query/subject sequence lengths, etc. in the output (as with blast, diamond, or mmseqs).

Thanks! Arturo

wrpearson commented 1 year ago

I have retired, so the odds of getting the python interface you outline is slim.

But I agree that it would be useful to have a customized blast-tabular output.

Right now, the information is there for query/subject sequence lengths (-m 8CBl gives you the lengths and BTOP alignment), and that combined with the query-start/end and library start/end can be post-processed to give you the coverage. I'm not sure how I feel about providing the actual sequences, since you only need the query sequence and the BTOP alignment to produce the aligned sequences (which is available with the script scripts/bl_btop_align.py). Including the query sequence with every output line seems inefficient.

Unfortunately, the structure of the fasta output functions do not make it easy to provide a general selectable tabular output. It is more likely that I would produce a blast .asn compatible output, so that the blast output formatter would do the work.

Bill Pearson

Message ID: @.***>

wrpearson / fasta36

DLL for fasta36 #54