nilesh-iiita / pysam

Automatically exported from code.google.com/p/pysam
0 stars 0 forks source link

Suggestion: accept lowercase header tags. #61

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Dear pysam developers,

The SAM standard 1.4 specifies:

“Tags containing lowercase letters are reserved for end users.”

How about accepting unkonw header tags when they are not all uppercase ?  I 
tried the following patch, but unfortunately it was not enough:

--- a/pysam/csamtools.pyx
+++ b/pysam/csamtools.pyx
@@ -1005,7 +1005,7 @@ cdef class Samfile:
                     x = {}
                     for field in fields[1:]:
                         key, value = field.split(":",1)
-                        if key not in VALID_HEADER_FIELDS[record]:
+                        if key not in VALID_HEADER_FIELDS[record] and 
key.upper() == key:
                             raise ValueError( "unknown field code '%s' in record '%s'" % (key, record) )
                         x[key] = VALID_HEADER_FIELDS[record][key](value)

When tested against a file that contains the tag ‘bc’ in the header 
‘@RG’, I obtain the following error:

Traceback (most recent call last):
  File "/home/charles/src/PromoterPipeline/level1.py", line 327, in <module>
    main(sys.argv[1:])
  File "/home/charles/src/PromoterPipeline/level1.py", line 324, in main
    run(input_filenames, output_filename, name, threshold, quality)
  File "/home/charles/src/PromoterPipeline/level1.py", line 229, in run
    current_assembly = find_assembly(handle)
  File "/home/charles/src/PromoterPipeline/level1.py", line 16, in find_assembly
    for row in handle.header['SQ']:
  File "csamtools.pyx", line 1010, in csamtools.Samfile.header.__get__ (pysam/csamtools.c:10419)
KeyError: 'bc'

(level1.py reads a BAM file with pysam and converts it in another format after 
collapsing reads that share a 5′ end).

The patch had no impact on BAM files containing a ‘BC’ tag (rejected as 
usual) or no unknown tag.  Accordingly:

>>> key = "SM"
>>> key not in VALID_HEADER_FIELDS["RG"] and key.upper() == key
False
>>> key = "BC"
>>> key not in VALID_HEADER_FIELDS["RG"] and key.upper() == key
True
>>> key = "bc"
>>> key not in VALID_HEADER_FIELDS["RG"] and key.upper() == key
False
>>> key = "Bc"
>>> key not in VALID_HEADER_FIELDS["RG"] and key.upper() == key
False

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan

Original issue reported on code.google.com by Charles....@gmail.com on 11 May 2011 at 2:15

GoogleCodeExporter commented 9 years ago
Thanks to the help of my colleague Michiel de Hoon, here is a working patch:

--- a/pysam/csamtools.pyx
+++ b/pysam/csamtools.pyx
@@ -1005,9 +1005,12 @@ cdef class Samfile:
                     x = {}
                     for field in fields[1:]:
                         key, value = field.split(":",1)
-                        if key not in VALID_HEADER_FIELDS[record]:
+                        if key in VALID_HEADER_FIELDS[record]:
+                            x[key] = VALID_HEADER_FIELDS[record][key](value)
+                        elif not key.isupper():
+                            x[key] = value
+                        else:
                             raise ValueError( "unknown field code '%s' in record '%s'" % (key, record) )
-                        x[key] = VALID_HEADER_FIELDS[record][key](value)

                     if VALID_HEADER_TYPES[record] == dict:
                         if record in result:
@@ -1030,6 +1033,9 @@ cdef class Samfile:
             for key in VALID_HEADER_ORDER[record]:
                 if key in fields:
                     line.append( "%s:%s" % (key, str(fields[key])))
+            for key in fields:
+                if not key.isupper():
+                    line.append( "%s:%s" % (key, str(fields[key])))
         return "\t".join( line ) 

     cdef bam_header_t * _buildHeader( self, new_header ):

Original comment by Charles....@gmail.com on 11 May 2011 at 4:34

GoogleCodeExporter commented 9 years ago
Thanks for this, I have added the patches to the source code.

Bw,
Andreas

Original comment by andreas....@gmail.com on 4 Jun 2011 at 8:36