Closed Cadair closed 4 years ago
So do you suggest that the client should be aware of all the operational satellites that have data available for the chosen date(s), and fall back to another satellite if data cannot be found?
Related to this, it is possible to search using the a.goes.SatelliteNumber
attribute to manually specify which satellite you want, but it does not seem to work as intended, e.g.:
In [69]: result
Out[69]:
<sunpy.net.fido_factory.UnifiedResponse object at 0x7f8c38647eb8>
Results from 1 Provider:
2 Results from the XRSClient:
Start Time End Time Source Instrument Wavelength
str19 str19 str4 str4 str3
------------------- ------------------- ------ ---------- ----------
2010-06-01 00:00:00 2010-06-01 23:59:59 nasa goes nan
2010-06-02 00:00:00 2010-06-02 23:59:59 nasa goes nan ```
The above should not work, as it is before GOES-15 data are available. But, the search result gives the impression that results are found. However:
```In [70]: Fido.fetch(result)
Files Downloaded: 0%| | 0/2 [00:00<?, ?file/s]
Out[70]:
<parfive.results.Results object at 0x7f8c3863aeb8>
[]
Errors:
(error(filepath_partial=<function Downloader.enqueue_file.<locals>.filepath at 0x7f8c4c7467b8>, url='https://umbra.nascom.nasa.gov/goes/fits/2010/go1520100602.fits', exception=FailedDownload()```
I truncated the error message, but you can see that when you try to actually download the GOES data, it does not exist. The same search and retrieval works correctly when GOES-14 is specified, as it should.
The hardcoded dates for the GOES satellite operations also do not account for all the times that data are available. For example, 2016 and 2017 both have data available from GOES-13 and GOES-14 as well as GOES-15.
IIRC the dates are sourced from NOAA and represent in some way which spacecraft is 'primary'.
If for same date, we have multiple satellites with data, what should we do? Should we have one more attr as 'satellite_no' or something similar, and a default satellite no. (mentioned in docs); if data is not available , just give an error messsage or empty results.
Suggestions?
there is an attrs for GOES - a.SatelliteNumber
https://github.com/sunpy/sunpy/blob/master/sunpy/net/dataretriever/attrs/goes.py
If none if given it currently provides the data for the operational satellite number at the time which is hard-coded in.
What would be great to see is if you search a time range then data that is available for that range is provided - for example GOES 15 and GOES 13.
I think NOAA also plans to release the GOES 13 data for the past solar cycle with the already available GOES 15 data.
I found a small hack. Since filelist
of scraper already opens the directory (in this case the year) , so if we pass satellite_number = r'\d{2}'
, I am getting every fits file those goes sats which are only available on those page (means we don't neet to use _get_goes_sat_num
function .)
So this is intelligent enough and don't need hardcorded dates. @hayesla I will open a PR for it, if it is fine.
2) Other solution is making multiple scrapers (for every sat_number) and check. But this would be very slow, since 14 html pages, again and again. Which one should I implement?
What would be great to see is if you search a time range then data that is available for that range is provided - for example GOES 15 and GOES 13.
If I remember correctly there are two major components to this:
1) Scraper needs to be able to handle matching and returning values in paths which are wildcard types, i.e "match all the satellite numbers". The values of these fields needs to be returned to the caller somehow as categorical data. (Probably along with time and everything else, which relates to a solution to #3715 ).
2) We need a way of displaying all this metadata about the URLs to the user of Fido. We don't want to implement this currently as the user would get duplicate results printed in the results of search()
with no way to disambiguate them. For this component we need to allow dataretriever classes to specify what they want to display in their results tables, which is #3321 and also is related to #3368.
So in summary, a proper solution to this issue is tightly coupled with a lot of things, many of which I hope are covered by the scope of the GSOC project idea I wrote up.
Currently in
XRSClient
we hard code the operational dates of the goes satellite and use the highest number we can for they query. For some days in the operational range of the satellite the files might not exist on the server. We should be smart about how we choose satellite number.