sunpy / sunpy

SunPy - Python for Solar Physics
http://www.sunpy.org
BSD 2-Clause "Simplified" License
918 stars 589 forks source link

Downloading from VSO is broken when using specific providers #3734

Open vit1-irk opened 4 years ago

vit1-irk commented 4 years ago

Description

VSO returns entries with Start Time: None when querying data for SOHO MDI

Expected behavior

No corrupted entries or a way to filter them out and not download

Actual behavior

Fido fails when trying to download the result UnifiedResponse object

Steps to Reproduce

https://gist.github.com/vit1-irk/a3891d0948c3aff347fd5a82c9ece79f

Also, see this: #2141 #3372

System Details

#######
General
#######
Time : Monday, 27. January 2020 04:54PM UT
System : Linux
Processor : 
Arch : 64bit
SunPy : 1.1.0
OS: Manjaro Linux 18.1.5 Juhraya (Linux 5.4.14-2-MANJARO )

##################
Required Libraries
##################
Python: 3.8.1
NumPy: 1.18.1
SciPy: 1.4.1
matplotlib: 3.1.2
Astropy: 4.0
Pandas: 0.25.3
parfive: 1.0.0

#####################
Recommended Libraries
#####################
beautifulsoup: 4.8.2
PyQt4: NOT INSTALLED
PyQt5: 5.14.1
Zeep: 3.4.0
Sqlalchemy: NOT INSTALLED
drms: 0.5.7
Cadair commented 4 years ago

This seems to be affecting MDI data from the 'SHA' provider, the query (to save a link click) is:

attrs_time = a.Time('2005/01/01 00:10', '2005/01/02 00:15')
result = Fido.search(attrs_time, a.Instrument('mdi'),\
                     a.Wavelength(6768*u.angstrom) & a.vso.Physobs('LOS_magnetic_field'))
ejm4567 commented 4 years ago

Found and corrected a typo in the SHA Data Provider for MDI. TimeStart should now be populated for all searches and files are downloadable from the web interface, IDL/SSW and SunPy.

Please test again and let us know if any issues are still present.

vit1-irk commented 4 years ago

@ejm4567 Start/end times are available now, but downloading of files is still broken in Sunpy

See new gist with comparison of outputs: https://gist.github.com/vit1-irk/2e518c02c792b5d1109c656431472254

abhijeetmanhas commented 4 years ago

I got the issue resolved by having one fileid per DataRequestItem in VSO create_getDataRequest and all 118 files were downloaded by running same commands in Viktor's notebook.

I spend a couple of days on this issue, I created two notebooks one with upstream sunpy installed and one with master. I would highly recommend to go through both of them and read all the comments I made there. Just look at XML body of request after fetch is called, where the datacontainer block with fileiditem in it is shown.

So I created a minimum query of three files, for very interpretable debug info, and I found for SHA provider, whenever we had multiple files accessed through mdi instrument, we always get a fetch error in VSO.

A similar issue https://github.com/sunpy/sunpy/issues/2284 with exacltly same error line which comes here too was also raised and it got fixed by having fileids grouped by series they belong too.

Server raised fault: 'Element '' can't be allowed in valid XML message. Died. at /opt/vso/lib/perl5/SOAP/Lite.pm line 1480.

This error is same for both issue. In my notebooks too, error causing files are of different series, and if we group them by series, no error occurs.

So I guess as marked by @Cadair , it is mostly a VSO remote server issue, though it can be solved at Sunpy end. This has to something to do with the way XML datarequests are made in VSO, I tested many queries, and saw whenever it failed, always there were two or more fileid per requestitem.

So this thing needs discussion with VSO team, to know why it fails, and I also want to know why the JSOC provider with different series files also failed, which was fixed by this PR https://github.com/sunpy/sunpy/pull/2621 .

wtbarnes commented 2 years ago

Does anyone know if this has been resolved upstream by the VSO?

dstansby commented 2 years ago

Just tried the code posted above:

import astropy.units as u
import sunpy
from sunpy.net import Fido, fido_factory, dataretriever, attrs as a

attrs_time = a.Time('2005/01/01 00:10', '2005/01/02 00:15')
result = Fido.search(attrs_time, a.Instrument('mdi'),\
                     a.Wavelength(6768*u.angstrom) & a.Physobs('LOS_magnetic_field'))
print(result)

downloaded_files = Fido.fetch(result, progress=False)

And it's still giving me an error:

Traceback (most recent call last):
  File "/Users/dstansby/github/sunpy/test.py", line 12, in <module>
    downloaded_files = Fido.fetch(result, progress=False)
  File "/Users/dstansby/github/sunpy/sunpy/net/fido_factory.py", line 426, in fetch
    result = block.client.fetch(block, path=path,
  File "/Users/dstansby/github/sunpy/sunpy/net/vso/vso.py", line 404, in fetch
    data_response = VSOGetDataResponse(self.api.service.GetData(data_request))
  File "/Users/dstansby/mambaforge/envs/sunpy/lib/python3.10/site-packages/zeep/proxy.py", line 46, in __call__
    return self._proxy._binding.send(
  File "/Users/dstansby/mambaforge/envs/sunpy/lib/python3.10/site-packages/zeep/wsdl/bindings/soap.py", line 135, in send
    return self.process_reply(client, operation_obj, response)
  File "/Users/dstansby/mambaforge/envs/sunpy/lib/python3.10/site-packages/zeep/wsdl/bindings/soap.py", line 229, in process_reply
    return self.process_error(doc, operation)
  File "/Users/dstansby/mambaforge/envs/sunpy/lib/python3.10/site-packages/zeep/wsdl/bindings/soap.py", line 329, in process_error
    raise Fault(
zeep.exceptions.Fault: Element '' can't be allowed in valid XML message. Died. at /opt/vso/lib/perl5/SOAP/Lite.pm line 1483.
Cadair commented 1 year ago

I can still reproduce this issue @sunpy/vso-contacts any further thoughts on what's going on here?

dstansby commented 1 year ago

Pinging @sunpy/vso-contacts again, anyone got any ideas why the small example in https://github.com/sunpy/sunpy/issues/3734#issuecomment-1284503238 is failing?

AlisdairDavey commented 1 year ago

Pinging @sunpy/vso-contacts again, anyone got any ideas why the small example in #3734 (comment) is failing?

Noted! VSO will take a look.

Cadair commented 1 year ago

@sunpy/vso-contacts I am still getting this error, any updates?

AlisdairDavey commented 1 year ago

Once again looking at this.