pacificclimate / pdp

The PCIC Data Portal - Server software to run the entire web application
GNU General Public License v3.0
1 stars 2 forks source link

Mangled netcdfs downloadable from dataportal catalogue JSON #65

Closed faronium closed 7 years ago

faronium commented 7 years ago

A user recently reported corrupted netcdf files when downloading PRISM data. I attempted to replicate by downloading using a BC-wide polygon select on the PRISM portal and was delivered data without issue. Upon further pursuit, it was revealed that the user was using links found here:

http://tools.pacificclimate.org/dataportal/docs/raster.html#dataset-listings

Which suggests ustilizing the JSON found here:

http://tools.pcic.uvic.ca/dataportal/bc_prism/catalog/catalog.json

These links yield corrupted netcdf files with bad time stamps and erroneous and meaningless data. So, the problem either is that the link to the data catalogue needs to be removed or replaced with a link to the valid data, or the files sitting behind the portal are getting corrupted upon dwnload. Again, note that data downloaded from the graphical portal are fine.

I'm pasting in the e-mail chain incase there are any useful clues in it. The reported issue of low bandwidth didn't prove to be an issue on our end.

Cheers, Faron

I got it from here:

http://tools.pacificclimate.org/dataportal/docs/raster.html#dataset-listings

Thanks for your help!

Charles


Charles Cuell, Ph.D. Climate Resilience Consulting Ltd Charles.Cuell@crcteam.ca +1 250.353.1732

www.crcteam.ca

On 19 April 2017 at 13:11, Faron Anslow fanslow@uvic.ca wrote:

Hello again.

Okay, the files linked in the json gave me total garbage too. I'll raise this with our developers. Can you tell me how you got to the JSON? Was it through the dataportal documentation?

Thanks,
Faron

On 2017-04-19 08:00 AM, Charles Cuell wrote:
Thanks for the email, Faron.

I used the following link, and downloaded the files from the urls. -Charles
http://tools.pcic.uvic.ca/dataportal/bc_prism/catalog/catalog.json

---
Charles Cuell, Ph.D.
Climate Resilience Consulting Ltd
Charles.Cuell@crcteam.ca
+1 250.353.1732

www.crcteam.ca

On 18 April 2017 at 18:11, Faron Anslow <fanslow@uvic.ca> wrote:

    Hi Charles,

    I've just tried to replicate your issue and cannot. I can tell you a bit more about what you should see.

    The timestamps are for the midpoint of the climate period that you are interested in and for the given month. So, the time stamp should be January 15th, 1985 if you are looking at January from the 1971 -- 2000 normal period. The first 12 slices should be the months in sequence followed by the annual average or total for temperature and precipitation respectively. That annual will have a timestamp in the very middle of the normal period (more or less) and is 30 June, 1985 in the 1971 -- 2000 case.

    If you are seeing integer values for the dates, then those should be days since 1 January, 1970 which is a standard reference.

    Can you tell me a little bit about how you downloaded the data so that I may have a chance to replicate what you are running in to?

    If a phone call would help you, I'm happy to do that.

    Hope you had a nice weekend.

    --Faron

    On 2017-04-13 10:19 PM, Trevor Murdock wrote:
    we use R 

    Trevor 
    ___________________________
    sent by iPhone

    On Apr 13, 2017, at 8:37 PM, Charles Cuell <charles.cuell@crcteam.ca> wrote:
    Thanks, Faron. 

    My biggest concern is that the first time slice is ok, but the rest are not. 

    What do you guys use for reading netcdf files? 

    Charles

    On Thu, Apr 13, 2017 at 20:35 Faron Anslow <fanslow@uvic.ca> wrote:

        Hi Charles,

        Trevor is correct about the 13th month being the annual. I'm away this week but will look more closely at the other issues when I return Tuesday next week.

        Cheers,
        Faron

        On Apr 13, 2017 1:33 PM, Charles Cuell <charles.cuell@crcteam.ca> wrote:

            Hi, Trevor and Faron.

            The BC_Prism netcdf files have 13 times, but they don't come out as dates or times. The attributes say days since 1970, 1, 1. I understand that these should label months (12) and one other value. Can you tell me what the times are supposed to be?

            array([  5.49300000e+03,  -1.09075706e-19,  -9.99900000e+03,
                     7.85973296e-03,  -9.99900000e+03,   7.85973296e-03,
                    -9.99900000e+03,   7.85973296e-03,  -9.99900000e+03,
                     7.85973296e-03,  -9.99900000e+03,   7.85973296e-03,
                                nan], dtype=float32)

            I opened the file with Panalopy, and it shows all 13 time stamps as 1981-01-01

            Secondly, the first time index shows actual data (time_index_1.png)

            Inline images 1

            But, the other indices give nonsense (time_index_2.png)

            Inline images 2

            The data scale shows that there's a problem with the data.

            Inline images 3

            This is all consistent with what I've been finding from reading the data in with Python and Matlab.

            Any ideas what's going on?

            Thanks!
            Charles

            ---
            Charles Cuell, Ph.D.
            Climate Resilience Consulting Ltd
            Charles.Cuell@crcteam.ca
            +1 250.353.1732

            www.crcteam.ca

            On 27 March 2017 at 13:53, Charles Cuell <charles.cuell@crcteam.ca> wrote:

                No, I don't have a map. I'll see if I can find some time to generate one.

                I have the same issue using matlab to read the data.

                ---
                Charles Cuell, Ph.D.
                Climate Resilience Consulting Ltd
                Charles.Cuell@crcteam.ca
                +1 250.353.1732

                www.crcteam.ca

                On 27 March 2017 at 13:40, Faron Anslow <fanslow@uvic.ca> wrote:

                    Hi Charles,

                    Do you have a map showing where these occur? My guess is that this relates to how your software handles masked regions, but a visual confirmation would be helpful. I'll be in my office tomorrow if you would like to chat about this.

                    Cheers,
                    Faron

                    On Mar 27, 2017 12:26 PM, Trevor Murdock <tmurdock@uvic.ca> wrote:

                        I haven’t encountered this. Can you give an example of where exactly you’re seeing such values?

                        Trevor

                        From: Charles Cuell [mailto:charles.cuell@crcteam.ca]
                        Sent: March 27, 2017 12:12 PM
                        To: Trevor Murdock <tmurdock@uvic.ca>
                        Subject: question about BC Prism

                        Hi, Trevor.

                        A quick question, I hope!

                        The BCPrism climatologies have some unreasonably large values (e.g 10^38). Are large values to be filtered out, or is there an issue with the data set?

                        Thanks!

                        Charles

                        ---

                        Charles Cuell, Ph.D.

                        Climate Resilience Consulting Ltd

                        Charles.Cuell@crcteam.ca

                        +1 250.353.1732

                        www.crcteam.ca

    -- 

    ---
    Charles Cuell, Ph.D.
    Climate Resilience Consulting Ltd
    Charles.Cuell@crcteam.ca
    +1 250.353.1732

    www.crcteam.ca
jameshiebert commented 7 years ago

This is a misunderstanding of how to use the catalog.json. While the URLs in the catalog appear to be files, they are actually dataset IDs and are not intended to be used as a direct download URL.

One should read farther below in the next section about how to construct a proper download URL:

Downloading the actual data values themselves is also done with a DAP request. There are a couple differences, however. First, to download data, the client must be logged in via OpenID. Secondly, the URL template for the request is http://tools.pacificclimate.org/dataportal/[page_id]/data/[dataset_id].[format_extension]?[dap_selection]

Everything after [format_extension] is optional though, so one could just do http://tools.pacificclimate.org/dataportal/data/bc_prism/pr_monClim_PRISM_historical_run1_198101-201012.nc.nc and get the same results. Note the second .nc at the end.