jeffdaily / parasail-python

Python bindings for the parasail C library.
Other
87 stars 17 forks source link

Custom matricies #8

Closed LizzieWadsworth closed 7 years ago

LizzieWadsworth commented 7 years ago

Is there a way I can make and use a custom matrix? I've been using the following DNA matrix allowing for G-U base pairing to study RNA interactions: A C G T A 10 -10 10 -10 C -10 10 -10 10
G -10 -10 10 -10 T -10 -10 -10 10 Is there a way I can get parasail to use this matrix? Thanks

jeffdaily commented 7 years ago

There should be a parasail.matrix_create(alphabet, match, mismatch) function. You would use it like so:

import parasail
my_custom_matrix =parasail.matrix_create("ACGT", 10, -10)
# Then pass it to any function expecting a matrix.
seq1 ="AAAAA"
seq2 = "AAGAA"
open = 10
extend = 1
result = parasail.sw(seq1, seq2, open, extend, my_custom_matrix)

Also, I just noticed the documentation in the README.md for the C library was wrong, but the Python library was correct. You specify gap open and extend penalties as >=0.

Let me know if this matrix create function is working for you. For our collective sanity it would be best to try it out on an alignment you know to verify parasail is doing the right thing for you.

LizzieWadsworth commented 7 years ago

matrix_create() takes exactly 3 arguments, however I need to allow for GU base pairing. I've been doing this previously using a score of +10 instead of -10 for AG and CT matches. Is there a way I can change the matrix values after creating the matrix? Also can I print the matrix to verify I've created the correct matrix?

I tried both: print my_custom_matrix.matrix and print parasail.blosum62.matrix and got the same error both times of:

Traceback (most recent call last): File "./parasail_test.py", line 6, in print parasail.blosum62.matrix File "/usr/local/lib/python2.7/dist-packages/parasail/init.py", line 196, in matrix (self.pointer[0].size, self.pointer[0].size)) File "/usr/local/lib/python2.7/dist-packages/parasail/init.py", line 49, in _make_nd_array buffer = buf_from_mem(c_pointer, arr_size) ctypes.ArgumentError: argument 2: <type 'exceptions.TypeError'>: Don't know how to convert parameter 2

Thanks for your help!

jeffdaily commented 7 years ago

I'm sure we can find a way to do this, so bear with me.

Which version of Python are you using? I want to try to recreate the problem exactly.

Nevermind, I can tell you're using Python 2. Read on...

On Mon, Dec 19, 2016 at 9:50 AM, LizzieWadsworth notifications@github.com wrote:

matrix_create() takes exactly 3 arguments, however I need to allow for GU base pairing. I've been doing this previously using a score of +10 instead of -10 for AG and CT matches. Is there a way I can change the matrix values after creating the matrix? Also can I print the matrix to verify I've created the correct matrix?

I tried both: print my_custom_matrix.matrix and print parasail.blosum62.matrix and got the same error both times of:

Traceback (most recent call last): File "./parasail_test.py", line 6, in print parasail.blosum62.matrix File "/usr/local/lib/python2.7/dist-packages/parasail/init.py", line 196, in matrix (self.pointer[0].size, self.pointer[0].size)) File "/usr/local/lib/python2.7/dist-packages/parasail/init.py", line 49, in _make_nd_array buffer = buf_from_mem(c_pointer, arr_size) ctypes.ArgumentError: argument 2: <type 'exceptions.TypeError'>: Don't know how to convert parameter 2

Thanks for your help!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeffdaily/parasail-python/issues/8#issuecomment-268030506, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3MONhhVZUYDr5oTp94c2it21QpIK3Jks5rJsPdgaJpZM4LQm7t .

jeffdaily commented 7 years ago

I just tried print parasail.blosum62.matrix, and for that matter, I also did it on a custom matrix. It was working fine for me using python 2.7.8 as well as python 3.4.2.

I'm not sure why it can't convert the 'arr_size' parameter since it should be a numpy.int64 type. I'm assuming your have numpy installed?

Would you mind adding a line of code into parasail/init.py and trying again? Add the following line just above the line 49 where the error occurred. I'll add some context to the following code block so hopefully it's easier to locate where to put the new line.

def _make_nd_array(c_pointer, shape, dtype=numpy.intc, order='C', own_data=True):
    arr_size = numpy.prod(shape[:]) * numpy.dtype(dtype).itemsize
    if sys.version_info.major >= 3:
        buf_from_mem = ctypes.pythonapi.PyMemoryView_FromMemory
        buf_from_mem.restype = ctypes.py_object
        buf_from_mem.argtypes = (ctypes.c_void_p, ctypes.c_ssize_t, ctypes.c_int)
        buffer = buf_from_mem(c_pointer, arr_size, 0x100)
    else:
        buf_from_mem = ctypes.pythonapi.PyBuffer_FromMemory
        buf_from_mem.restype = ctypes.py_object
        buf_from_mem.argtypes = (ctypes.c_void_p, ctypes.c_ssize_t) # <-- this is the new line 49
        buffer = buf_from_mem(c_pointer, arr_size)
    return numpy.ndarray(tuple(shape[:]), dtype, buffer, order=order)
LizzieWadsworth commented 7 years ago

Hi, hope you're enjoying the holiday season. (I understand that you won't be available to reply until whenever you're back at work obviously) That line of code fixed the issue with printing the matricies, thanks!

I have tried to create a custom matrix to repeat work done with EMBOSS water. In this I have come across two issues so far:

1) Extra column and row of 0's From using: my_custom_matrix = parasail.matrix_create("ACGT", 10, -10) print my_custom_matrix.matrix I output a matrix of: [[ 10 -10 -10 -10 0] [-10 10 -10 -10 0] [-10 -10 10 -10 0] [-10 -10 -10 10 0] [ 0 0 0 0 0]]

This still seems to work regardless for Smith-Waterman alignments, I was just wondering whether there was any purpose of or negative implications from this.

2) Editing the matrix to allow for G-U base pairing Previously using EMBOSS water I was using ACGT matrix: [[ 10 -10 10 -10] [-10 10 -10 10] [-10 -10 10 -10] [-10 -10 -10 10]] (The difference here is the 3rd item of the first row and 4th item of the second row is changed from -10 to +10)

I have found that I was able to edit my_custom_matrix.matrix using the following code: a = np.matrix('10 -10 10 -10; -10 10 -10 10; -10 -10 10 -10; -10 -10 -10 10') my_custom_matrix.matrix = a

and I output the correct matrix from print my_custom_matrix.matrix

However I've found that regardless of this change, the matrix will act the same.

To test this I used: gapopen = 100 gapextend = 10 seq = 'CCCC' db = 'ATCATCCAATTTCGGACGCGTGTGGGGTCTCATCATACCC' result = parasail.sw_stats_striped_sat(seq, db, gapopen, gapextend, my_custom_matrix) sc = float(result.score) eq = int(result.end_query) + 1 er = int(result.end_ref) + 1 l = int(result.length) sr = er - l sq = eq - l ref = [sr, er] print "Db match:", db[sr:er], "Seq match:", seq[sq:eq], "Score: ", sc

Using Parasail the sequence matches to the CCC in db, whereas using EMBOSS the sequence matches the TTTC in db as C-T matches get a score of 10 as well as C-C matches.

To further illustrate my point, I changed the line in the beginning from: my_custom_matrix = parasail.matrix_create("ACGT", 10, -10) To: my_custom_matrix = parasail.matrix_create("ACGT", 10, 10) and my_custom_matrix = parasail.matrix_create("ACGT", 100, -10).

I found that despite my attempt to change my_custom_matrix.matrix, the score would change from 30 with ("ACGT", 10, -10) to 40 with ("ACGT", 10, 10) (matching the first 4 characters of db) and to 300 with ("ACGT", 100, -10).

Therefore the matrix being used in parasail was whatever I had defined in parasail.matrix_create and changing my_custom_matrix.matrix to a defined matrix did not make a difference to the output.

Is there a way I can change a custom matrix after creating it? Or create an unsymmetrical matrix to begin with?

Thanks so much for your help,

Hope you've had a lovely Christmas and have a wonderful new year!

jeffdaily commented 7 years ago

Let's blame the holidays (and three paper deadlines) for my slow response.

The reason I think it's not working for you is because the matrices are read-only. I'm looking into how to make the matrices modifiable. It will likely take a change to the C library as well as this Python wrapper. I'll keep you posted. Your request isn't lost, yet.

jeffdaily commented 7 years ago

I am preparing a new C library release. https://github.com/jeffdaily/parasail/tree/release/1.2 Among other things, it contains new matrix functions for modifying matrices.

I've already wrapped the new functions for python and committed them to the parasail-python project -- a bit out of order since the C library release isn't pushed out yet. I just wanted to let you know I should have this all done soon, perhaps one week or less.

jeffdaily commented 7 years ago

Alright, I pushed out parasail v1.2 today (1/28/2017). It has the functions you need for modifying matrices. I haven't yet pushed a new parasail-python package for pip yet, but if you're eager to grab the new code I'm sure you can figure it out.

There's a very basic test file for editing matrices called "test_matrix.py".

LizzieWadsworth commented 7 years ago

Seems to be working as expected. Thank you!