sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
473 stars 80 forks source link

load_signature should be bytes-compatible #413

Open luizirber opened 6 years ago

luizirber commented 6 years ago

While trying to load data using requests I hit this error:

import requests
import sourmash_lib

data = requests.get('https://wort.oxli.org/view/sra/ERR022107')
sig  = sourmash_lib.load_one_signature(
            data.content,
            ksize=51)
TypeError                                 Traceback (most recent call last)
      4 sig  = sourmash_lib.load_one_signature(
      5            data.content,
----> 6            ksize=51)

sourmash/sourmash_lib/signature.py in load_one_signature(data, ksize, select_moltype, ignore_md5sum)
    235
    236     try:
--> 237         first_sig = next(sigiter)
    238     except StopIteration:
    239         raise ValueError("no signatures to load")

sourmash/sourmash_lib/signature.py in load_signatures(data, ksize, select_moltype, ignore_md5sum, do_raise)
    189
    190     is_fp = False
--> 191     if hasattr(data, 'find') and data.find('sourmash_signature') == -1:   # filename
    192         done = False
    193         try:                                  # is it a file handle?

TypeError: a bytes-like object is required, not 'str'

I ended up doing a .decode('utf-8') to make it work:

import requests
import sourmash_lib

data = requests.get('https://wort.oxli.org/view/sra/ERR022107')
sig  = sourmash_lib.load_one_signature(
            data.content.decode('utf-8'),
            ksize=51)
ctb commented 4 years ago

is this still an issue or has the oxidation fixed this?

ctb commented 3 years ago

related? at least it mentions encodings... #1428

JanDeneweth-bmx commented 3 years ago

Should not really be related. This is about the input for the sourmash_lib.load_one_signature (C++/Rust source it seems) function requiring bytes as input, but requests.get returns a (unicode) string. Strings can be converted to bytes using decode. This would raise questions about whether the right encodings are used by requests.get on one hand, and sourmash_lib.load_one_signature or other downstream processes that might convert to python unicode strings. If processing remains on byte level, and binary output is written directly to files, this should not be an issue.

I presume the interface has changed since the switch to Rust, so who knows how relevant this still is.