spedas / pyspedas

Python-based Space Physics Environment Data Analysis Software
https://pyspedas.readthedocs.io/
MIT License
147 stars 58 forks source link

Excessive server queries from ERG load routines? #745

Closed jameswilburlewis closed 6 months ago

jameswilburlewis commented 7 months ago

Especially in the ground data load routines, the ERG code seems to be making more requests than it needs to get directory indices. Example:

>>> fv=pyspedas.erg.gmag_isee_fluxgate(trange=['2020-08-01','2020-08-02'], site='all')

outputs this:

02-Feb-24 15:07:50: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/1min/ktb/2020/
02-Feb-24 15:07:51: File is current: erg_data/ground/geomag/isee/fluxgate/1min/ktb/2020/isee_fluxgate_1min_ktb_20200801_v01.cdf
**************************************************************************
ISEE Ground-Based Fluxgate Magnetometer 1 min Resolution Data
Information about KTB
PI and Host PI(s):
(1) Kazuo Shiokawa (2) Mamat Ruhimat
Affiliations: 
(1) Institute for Space-Earth Environmental Research, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan (2) National Institute of Aeronautics and Space (LAPAN), Jln. Dr. Djundjunan 135, P.O.BOX 26, Bandung, 40173, Indonesia
Rules of the Road for ISEE Fluxgate Data Use:
1. Please contact Kazuo Shiokawa (shiokawa at nagoya-u.jp) before using the data for any publications and/or presentations.
2. For the hdz_1sec variable, channels 1, 2, and 3 correspond to the H, D, and Z components, respectively.
3. For the data from the stations in the northern hemisphere, [H:+ = northward] [D:+ = eastward] [Z:+ = downward], and for the data from the stations in the southern hemisphere, [H:+ = northward] [D:+ = eastward] [Z:+ = upward]. For more information, see http://stdb2.isee.nagoya-u.ac.jp/magne/notes_errors.html and http://stdb2.isee.nagoya-u.ac.jp/mm210/error.html.
4. These data are obtained by averaging sixty 1-sec sampled data (One-min data at 00h01mUT is an average of 00h00m30s-00h01m29sUT), with an exception that the one-min data at 00h00mUT is an average of 00h00m00s-00h00m29sUT.)
For more information, see http://stdb2.isee.nagoya-u.ac.jp/magne/
**************************************************************************
02-Feb-24 15:07:51: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:51: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:51: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:52: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:52: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:52: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:52: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:52: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:52: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:52: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:52: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:52: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:52: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:52: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:52: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:52: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:52: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:53: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:53: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:53: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:53: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:53: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:53: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:53: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:53: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:53: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:53: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:53: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:53: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:53: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:53: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:54: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:54: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:54: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:54: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:54: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:54: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:54: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:54: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:54: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:54: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:54: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:54: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:55: Remote index not found: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:07:55: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:08:00: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/64hz/ktb/2020/08/
02-Feb-24 15:08:00: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/1min/ktb/2020/
02-Feb-24 15:08:01: Downloading remote index: https://ergsc.isee.nagoya-u.ac.jp/data/ergsc/ground/geomag/isee/fluxgate/1sec/ktb/2020/

This is just for the first site, it goes on like this for pages...

I've noticed while one of these routines is running, I am often getting "connection refused" from the server when trying to browse some of these URLs. I think we may be running afoul of some rate-limiting firewall rules. Perhaps our download routine is retrying if it can't retrieve the directory index the first time? The above call used site='all', and we only successfully retrieved data from a single site, probably due to the server blocking so many of the requests.

The erg.load() routine is specifying 'last_version=True' unconditionally when it calls pyspedas.download(), and that is forcing it to fetch directory indices and parse available version numbers, instead of building a static URL for the file to be downloaded. For much of the ground data, I don't think there are even multiple version numbers. (Annoyingly, I can't check the satellite data right now because I'm blocked from the server....)

So maybe we need to change erg.load() and some of the individual load routines to default to version='v01' rather than using 'last_version=True' all the time.

In the past, I've had to disable the akebono test suite on Github because it was failing -- that data, I believe, is also served by the ERGSC, so this might explain why.

jameswilburlewis commented 6 months ago

This is also happening for Cluster: >>> wbd_vars = pyspedas.cluster.wbd(trange=['2003-11-01','2003-11-02'],probe=['1','2'])

13-Feb-24 11:31:52: Downloading remote index: https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010000_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010010_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010020_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010030_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010040_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010050_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010100_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010110_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010120_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010130_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010140_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010150_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010200_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010210_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010220_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010230_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010240_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010250_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010300_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010310_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010320_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010330_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010340_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010350_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010400_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010410_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010420_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010430_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010440_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010450_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010500_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010510_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010520_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010530_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010540_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010550_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010600_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010610_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/
13-Feb-24 11:31:53: No links matching pattern c1_waveform_wbd_200311010620_v??.cdf found at remote index https://spdf.gsfc.nasa.gov/pub/data/cluster/c1/wbd/2003/11/

See also: https://github.com/spedas/pyspedas/issues/746