kinverarity1 / lasio

Python library for reading and writing well data using Log ASCII Standard (LAS) files
https://lasio.readthedocs.io/en/latest/
MIT License
344 stars 151 forks source link

Error when LAS data contains text with spaces enclosed in double quotes #271

Closed ghost closed 3 years ago

ghost commented 5 years ago

I am encountering an error trying to load a LAS file that has a paramaeter that sometimes contains spaces. This seem to occur for many LAS files that contain picks: the pick name is a text field which may contain spaces.

It is possible to replace these spaces with underscores, or something like that?

Here's an example LAS file:

~Version
 VERS              .         2.0                           : CWLS LOG ASCII STANDARD - VERSION 2.0
 WRAP              .         NO                            : ONE LINE PER DEPTH STEP
 DLM               .         SPACE                         : DELIMITING CHARACTER(SPACE TAB OR COMMA)
~Well Information
#_______________________________________________________________________________
#
#PARAMETER_NAME    .UNIT     VALUE                         : DESCRIPTION
#_______________________________________________________________________________
STRT               .m        321.16                        : First reference value
STOP               .m        3188.59                       : Last reference value
STEP               .m        0                             : Step increment
NULL               .         -9999                         : Missing value
WELL               .         xxx                           : Well name
~Curve Information
#_______________________________________________________________________________
#
#LOGNAME           .UNIT     LOG_ID                        : DESCRIPTION
#_______________________________________________________________________________
MD                 .m                                      :  
ZONE               .unitless                               :  
~Ascii
321.16     pick_alpha
1753.2     pick_beta    
1953.5     "pick gamma"      
2141.05    "pick delta"    
2185.34    pick_epsilon

Here is what I get from LASIO

# Try to load data as pandas table
las = lasio.read('data/test_file.las')
las.df()

The result is badly parsed:

    ZONE
MD  
321.16  pick_alpha
1753.2  pick_beta
1953.5  "pick
gamma"  2141.05
"pick   delta"
2185.34 pick_epsilon

PS thanks for your work on the library so far, it is proving tremendously useful!

ghost commented 5 years ago

I've found a workaround which seems to be okay, passing a couple of regex substitutions with the null_policy keyword:

LAS_CLEANERS = [
    # matches two words within double quotes
    (r'"([a-zA-Z]*) ([a-zA-Z]*)"', r'\1_\2'),

    # matches three words within double quotes
    (r'"([a-zA-Z]*) ([a-zA-Z]*) ([a-zA-Z]*)"', r'\1_\2_\3')
]

las = lasio.read('data/test_file.las', null_policy=LAS_CLEANERS)

I'm not sure how to generalise this to further number of words. But this works for now :)

kinverarity1 commented 5 years ago

Thanks for the report! That's a neat work-around.

Unfortunately lasio is still only written to support numerical data sections (i.e. LAS v2), so I wouldn't expect that to work yet. I'd like to support text data sections but haven't had time to work on it.

ghost commented 5 years ago

OK, fair enough. I'll close the issue for now then, as we have a simple workaround.

kinverarity1 commented 5 years ago

No worries - it's certainly something I'd like to have implemented. Thanks for the example!

ghost commented 3 years ago

Hi @kinverarity1 ,

I've realised my solution above has a major disadvantage - it turns off the default NULL_POLICY, so regular null values are ignored!

Is there a way to use some custom regex substitutions, in combination with a NULL_POLICY from lasio.defaults?

Anjum48 commented 3 years ago

At the moment, if one or more of the columns are non-numeric, the NULL_POLICY fails to replace missing values with np.nan.

This is because when the below array is created https://github.com/kinverarity1/lasio/blob/817fb82914cbe62009f651340ebec51fbb466174/lasio/reader.py#L456-L458 the array will be of type string (<U32), and the missing numbers in lasio.defaults.NULL_SUBS won't be matched, since we are comparing as string e.g. "-999" against a number like -999.

kinverarity1 commented 3 years ago

Thanks @Anjum48 - I have opened a new issue for that.

kinverarity1 commented 3 years ago

@Connossor I have changed the data section code to split into items while respecting quoted strings. Hopefully that fixes the original issue you raised, although obviously other NULL values are still being ignored per @Anjum48's comment. I've opened #422 to deal with that.