answerquest commented 2 years ago

Hi, there is a simpler (or you can say, more low-level) requirement that I think many folks will have :

For any one-off downloaded .GRD file from the IMD webiste, like: "Maxtemp_MaxT_1979.GRD" (that's the file naming done by the server when you download directly from browser), imdlib should provide a function to directly open a raw file.

Instead of having to supply a containing folder, start/end year and other params, So, we should have a command like:

imdlib.open_data_file_year('tmax', 'Maxtemp_MaxT_1979.GRD', 1979)

Am trying to make a function and submit a PR.

iamsaswata commented 2 years ago

Hi Nikhil. Absolutely, this is an important feature for sure.

But this feature is already there from last year, and in a more concise (perhaps consistent) way.

But as you said, I thought the file name convention may have changed since we are now serving the data from the new server. So, just now I gave it a try.

So, I downloaded the data using my browser (not using imdlib). Kept it in a folder (can be seen in the screenshot). Open python from the same folder, and tried to open it with imdlib as earlier (which is also documented in https://imdlib.readthedocs.io/en/latest/Usage.html#reading-imd-datasets ). And it worked perfectly. I am attaching a screenshot of the same, so you can try the exact same thing on your setup.

Just to mention, instead of the whole file name, it requires the "variable name" and "year", which is more concise and it maintains consistency in multiple places (within imdlib).

open_data

answerquest commented 2 years ago

That's great to know! But for what it's worth, to at least make compatible with any kind of filename (end user might also be doing something like keeping files away somewhere, naming differently), this function could be useful. Just implemented it, here's a demo program:

https://gist.github.com/answerquest/16aac80eab154fd276cabcf4eafbcb33

answerquest commented 2 years ago

In this pattern, if there is a simple direct file opening function for the day-wise data, then I think it would be quite useful and simpler to implement than doing the code to download day-wise data.

iamsaswata commented 2 years ago

I appreciate the work/effort you have put into it. But, there are some major issues with the pull request/direction you presented.

You mentioned "But for what it's worth, to at least make compatible with any kind of filename (end user might also be doing something like keeping files away somewhere, naming differently), this function could be useful.". But except for an edge case of a random filename (which is problematic, I will explain why), the other case ("keeping files away somewhere") is also already covered in current version of imdlib with the "file_dir" option, (e.g., data = imd.open_data(variable, start_yr, end_yr,'yearwise', file_dir)).

Now, with the, open_data functionality you presented in #19 """ data = imdlib.open_data_file_year('tmax','Maxtemp_MaxT_1979.GRD',1979)

1st arg: var_type : 'rain', 'tmax' or 'tmin'

2nd arg: filename / path to file

3rd arg: year for which data is

""" First thing I can notice, it is straight duplicate (with modification for filename case) of an existing imdlib function , which is against the central idea of Python Enhancement Proposals (PEP) 350, which says "Information should be almost never duplicated – it should be recorded in a single original format and all other locations should be automatically generated from the original, or simply be referenced. This is famously known as the Single Point Of Truth (SPOT) or Don’t Repeat Yourself (DRY) rule". It can be found at https://peps.python.org/pep-0350/.

Second, the approach you presented is carrying some redundant information (what is the role of the argument 'year', if you are giving the full file name anyway. Can't it extract year info from it?. And if one need to provide the year anyway, why don't the program can't estimate the full filename from there) which should always be minimized in any form of packages.

Third and the most important caveat to me, use of the proposed function for multi-year data reading. Although, it is okay for reading one year of data (in case of archive data and not the real-time data), this function does not look optimal for reading multi-year data. In this approach then one need to provide a list/tuple/array/reg-exp of filenames. To avoid it, some sort of pattern is needed. Use of the original imd file-naming convention or the yearwise convention are providing a pattern to solve this exact issue and they are already implemented here.

To summarize it, a random file naming is an edge case for imdlib for the mentioned reason, and all are encouraged to use the original filenames or the year-wise file-naming at this point.

Regarding "In this pattern, if there is a simple direct file opening function for the day-wise data, then I think it would be quite useful and simpler to implement than doing the code to download day-wise data.", again I want to implement in a way it is able to find some patterns, and capable of not only reading one-day data but for any continuous stretch of data with ease.

iamsaswata / imdlib

Feature request: function to directly open a data file #18

1st arg: var_type : 'rain', 'tmax' or 'tmin'

2nd arg: filename / path to file

3rd arg: year for which data is