refresh-bio / SPLASH

57 stars 6 forks source link

Protein sequence (with optional alphabets) support / FFI #23

Open olgabot opened 5 months ago

olgabot commented 5 months ago

Hello, Hope you are doing well. I have used degenerate protein alphabets to another k-mer based program, Sourmash, specifically the Dayhoff table and Hydrophobic-Polar table.

I'd love to test SPLASH vs Sourmash on UniProt protein sequences, but I saw that in splash.py, there's no foreign function interface (FFI) used in Python, it's all sys calls. Is there a way I can call splash.consume_sequence(seq) or something? I'd like to run on a stream of strings created from REST API calls to UniProt, so the fasta parsing wouldn't work as well for me. Thank you so much! Warmest, Olga