sunpy / sunpy

SunPy - Python for Solar Physics
http://www.sunpy.org
BSD 2-Clause "Simplified" License
919 stars 591 forks source link

read_srs incorrectly parses a badly formatted SRS #7126

Open samaloney opened 1 year ago

samaloney commented 1 year ago

Describe the bug

After parsing end up with a table with unexpected extra columns. In an ideal world the file would be fixed upstream but not sure how likely this is falling this maybe throw and error or warning. Might not be worth the effort but wanted to record it anyway.

To Reproduce

from sunpy.io.special import read_srs

srs = read_srs('19961209SRS.txt)

srs
<QTable length=2>
 ID  Number Carrington Longitude   Area   Z   Longitudinal Extent Number of Sunspots Mag Type Col1 Col2  Col3  Col4  Col5 Latitude Longitude
                    deg            uSH                deg                                                                   deg       deg
str2 int64        float64        float64 str3       float64             int64          str4   str4 str3  str6  str3 int64 float64   float64
---- ------ -------------------- ------- ---- ------------------- ------------------ -------- ---- ---- ------ ---- ----- -------- ---------
   I   8003                354.0    10.0  BXO                 4.0                  4     BETA   --   --     --   --    --    -30.0     -10.0
  II     --                  ———     ———   --                 ———                 --       -- NMBR  LAT LO7997  N05   242      ———       ———

Where the srs file is

:Product: 19961209SRS.txt
:Issued: 1996 Dec 09 0030 UTC
# Prepared jointly by the U.S. Dept. of Commerce, NOAA,
# Space Environment Center and the U.S. Air Force.
#
JOINT USAF/NOAA SOLAR REGION SUMMARY
SRS NUMBER 344 ISSUED AT 0030Z ON 09 DEC 1996
REPORT COMPILED FROM DATA RECEIVED AT SWO ON 08 DEC
I.  REGIONS WITH SUNSPOTS.  LOCATIONS VALID AT 08/2400Z
NMBR LOCATION  LO  AREA  Z   LL   NN MAG TYPE
8003 S30E10   354  0010 BXO  04   04 BETA
IA. H-ALPHA PLAGES WITHOUT SPOTS.  LOCATIONS VALID AT 08/2400Z DEC
NMBR  LOCATION  LO
NONE
II. REGIONS DUE TO RETURN 09 DEC TO 11 DEC
NMBR LAT    LO7997 N05    242

The last line of the SRR file has been concatenated with the previous

Screenshots

No response

System Details

General ####### OS: Mac OS 13.4 Arch: 64bit, (arm) sunpy: 5.0.0 Installation path: /Users/sm/.virtualenvs/arccnet/lib/python3.9/site-packages/sunpy-5.0.0.dist-info

Required Dependencies ##################### astropy: 5.3 numpy: 1.24.3 packaging: 23.1 parfive: 2.0.2

Installation method

No response

dstansby commented 1 year ago

Can you provide a link to where this data is available? Can it be got at through Fido?

samaloney commented 1 year ago

Yea sure this should reproduce the problem

from sunpy.net import Fido, attrs as a
from sunpy.io.special import read_srs

q = Fido.search(a.Time('1996-12-09', '1996-12-10'), a.Instrument.soon)
f = Fido.fetch(q)
srs = read_srs(f[0])

but the SRS table has not been parse properly e.g.

srs
Out[2]:
<QTable length=2>
 ID  Number Carrington Longitude   Area   Z   Longitudinal Extent Number of Sunspots Mag Type Col1 Col2  Col3  Col4  Col5 Latitude Longitude
                    deg            uSH                deg                                                                   deg       deg
str2 int64        float64        float64 str3       float64             int64          str4   str4 str3  str6  str3 int64 float64   float64
---- ------ -------------------- ------- ---- ------------------- ------------------ -------- ---- ---- ------ ---- ----- -------- ---------
   I   8003                354.0    10.0  BXO                 4.0                  4     BETA   --   --     --   --    --    -30.0     -10.0
  II     --                  ———     ———   --                 ———                 --       -- NMBR  LAT LO7997  N05   242      ———       ———

What we ended up doing locally was comparing the column names of the parsed file to the expected column names and raising an error if they didn't match.