mdshw5 / pyfaidx

Efficient pythonic random access to fasta subsequences
https://pypi.python.org/pypi/pyfaidx
Other
459 stars 75 forks source link

Obtain numpy array for sequence #139

Closed alimanfoo closed 6 years ago

alimanfoo commented 6 years ago

Apologies if I've missed this in the documentation. I've been using pyfasta for a long time and often make use of the ability to load a sequence into a numpy array, e.g.:

In [2]: import pyfasta

In [3]: fasta = pyfasta.Fasta('/kwiat/vector/ag1000g/release/phase2.AR1/genome/agamP4/Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4.fa')

In [4]: list(fasta)
Out[4]: ['UNKN', '2L', 'X', '2R', '3R', 'Y_unplaced', 'Mt', '3L']

In [5]: import numpy as np

In [6]: seq = np.asarray(fasta['2R'])

In [7]: seq
Out[7]: 
array([b'C', b'T', b'c', ..., b'A', b'C', b'A'],
      dtype='|S1')

Is there an equivalent capability in pyfaidx?

alimanfoo commented 6 years ago

cc @hardingnj

mdshw5 commented 6 years ago

I think I can implement this functionality using the __array_interface__ property similar to how pyfasta FastaRecord objects work:

https://github.com/brentp/pyfasta/blob/c2f0611c5311f1b1466f2d56560447898b4a8b03/pyfasta/records.py#L163-L170

@brentp can you let me know if there's anything else needed for this feature?

alimanfoo commented 6 years ago

If possible that would be great, thank you.

brentp commented 6 years ago

I don't think anything else is needed. thanks for implementing!

mdshw5 commented 6 years ago

No problem. I added support in the current master branch, but still have to figure out python3 buffer interface compatibility. It works in python 2.7 currently, so if that's what you're using you can test it out like this:

pip install -e git+https://github.com/mdshw5/pyfaidx.git#egg=pyfaidx
mdshw5 commented 6 years ago

I've figured out python3 compatibility and just pushed a new release. CI should finish in a few minutes and you can then install version 0.5.4, which includes this new feature. Please let me know if it doesn't work as expected and I'll be glad to help further.

alimanfoo commented 6 years ago

Awesome, thank you!

On Sat, 12 May 2018, 16:45 Matt Shirley, notifications@github.com wrote:

I've figured out python3 compatibility and just pushed a new release. CI should finish in a few minutes and you can then install version 0.5.4, which includes this new feature. Please let me know if it doesn't work as expected and I'll be glad to help further.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mdshw5/pyfaidx/issues/139#issuecomment-388563752, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8Qmg8UbWy6HQcYLQYsDqea7dQVwU5ks5txwOWgaJpZM4T7KIQ .

alimanfoo commented 6 years ago

Just to say, works like a charm on my mosquito genomes, thanks again.