lixin6135 / pysam

Automatically exported from code.google.com/p/pysam
0 stars 0 forks source link

garbage characters when changing tags #129

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. put the attached test.py and test.sam in the same directory
2. run 'python test.py'
3. run 'samtools view test.bam' to see the test.bam file generated by test.py

What is the expected output?

The test.bam file should contain only one line, with a tag 
YA:Z:XXXXXXXXXXXXXXXXX...XXX

What do you see instead?

The YA tag ends in garbage characters:
YA:Z:XXXXXXXXXXXXX...XXXX^P^A

What version of the product are you using? On what operating system?

pysam 0.7.4 on linux with Python 2.6.5

Please provide any additional information below.

Whether or not garbage characters are produced depends on the exact number of 
characters (X's) in the YA string.
Pysam 0.6 does not seem to have this problem.

Original issue reported on code.google.com by mjldeh...@gmail.com on 14 Jun 2013 at 4:08

Attachments:

GoogleCodeExporter commented 8 years ago
I found garbage characters being produced with this script on two different 
linux machines running Python 2.6.4 and 2.6.5. The same script did not produce 
garbage characters on linux running Python 2.7.1, or mac running Python 2.7.3. 
So this may be an issue with Python itself. I'll upgrade Python to 2.7 on the 
two linux machines to check if the problem still appears then.

Original comment by mjldeh...@gmail.com on 15 Jun 2013 at 2:13

GoogleCodeExporter commented 8 years ago
It looks like the problem is in the code generated by Cython.

In file pysam/csamtools.c, starting at line 24337, the generated code is

    __pyx_t_18 = PyBytes_AsString(__pyx_t_14); if (unlikely((!__pyx_t_18) && PyErr_Occurred())) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2652; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __Pyx_DECREF(__pyx_t_14); __pyx_t_14 = 0;
    __pyx_v_temp = __pyx_t_18;

PyBytes_AsString(__pyx_t_14) returns a pointer to the internal buffer of 
__pyx_t_14.
In the next line __pyx_t_14 is DECREF'ed, but then in the next line we continue 
to use __pyx_t_18, which is now an invalid pointer as it points to the internal 
buffer of the DECREF'ed __pyx_t_14.

Evaluating the exact byte content of __pyx_t_18 shows that its last bytes get 
corrupted by the call to __Pyx_DECREF. The byte content of __pyx_t_18 is 
correct before the call to __Pyx_DECREF, but its last bytes are corrupted after 
the call to __Pyx_DECREF.

Original comment by mjldeh...@gmail.com on 16 Jun 2013 at 2:01

GoogleCodeExporter commented 8 years ago
The section "Caveats when using a Python string in a C context" in the Cython 
manual explains what is going on here. In line 2652 of csamtools.pyx, instead of

    temp = buffer.raw
    memcpy( s, temp, total_size )

we should have

    p = buffer.raw
    temp = p
    memcpy (s, temp, total_size )

to guarantee that buffer.raw stays alive long enough for memcpy to use it.

Original comment by mjldeh...@gmail.com on 16 Jun 2013 at 2:53

GoogleCodeExporter commented 8 years ago
Thanks!

I don't seem to be able to replicate the behaviour, but what you say is correct 
and I have added the fix.

Best wishes,
Andreas

Original comment by andreas....@gmail.com on 26 Jun 2013 at 9:31

GoogleCodeExporter commented 8 years ago
Issue 132 has been merged into this issue.

Original comment by andreas....@gmail.com on 18 Sep 2013 at 6:50