nansencenter / nansat

Scientist friendly Python toolbox for processing 2D satellite Earth observation data.
http://nansat.readthedocs.io
GNU General Public License v3.0
181 stars 66 forks source link

description on the wiki on how to add custom (user/institution specific) mappers #87

Closed mortenwh closed 9 years ago

mortenwh commented 10 years ago

Ref #75 - a description on how to add custom (user/institution specific) mappers is needed...

knutfrode commented 10 years ago

A related question: Some mappers (e.g. "ncep_wind_online") require internal use of Nansat class. These mappers are broken after the introduction of namespace packages, as Nansat cannot be imported like before:

from nansat import Nansat

For some reason, this works (in all mappers):

from nansat.vrt import VRT

But this does not work:

from nansat.nansat import Nansat
-> ImportError('cannot import name Nansat',)

How is it possible to import Nansat from within a mapper?

mortenwh commented 10 years ago

ncep_wind_online works for me with from nansat.nansat import Nansat in line 121. Doesn't it work for you? Where did you try that line?

The imports work "as expected": 1: from nansat.vrt import VRT: import class VRT from module vrt (vrt.py) in package nansat 2: from nansat.nansat import Nansat: import class Nansat from module nansat (nansat.py) in package nansat

from nansat import * imports Nansat because it is in the all list, but better to be explicit...

knutfrode commented 10 years ago

Hm, this is quite strange! From a mapper, I am not able to import Nansat with any kind of statements I tried, e.g:

from nansat.nansat import Nansat
from nansat import Nansat
from .. import Nansat
from . import Nansat
import Nansat

I am using latest, unmodified "develop" version. I installed Nansat with

python setup.py build_ext --inplace
akorosov commented 10 years ago

Any NCEP online url to use in testing?

knutfrode commented 10 years ago

You can test from commandline:

$ nansatinfo ncep_wind_online:201201011200

or from Python:

s = Nansat('ncep_wind_online:201201011200', mapperName='ncep_wind_online')

The point of this mapper is that you don't need a URL for a specific file; you simply give a keyword+time, and mapper finds URL to file closest in time, and downloads this. Similar principle is used for mappers finding files in local archive, or through Thredds.

mortenwh commented 10 years ago
sh: /<...>/nansat/mappers/get_inv.pl: Permission denied
sh: /<...>/nansat/mappers/get_grib.pl: Permission denied

This doesn't seem to be import related, but do you get this? I added the perl scripts to package_data in setup.py the other day but obviously they must be made executable (but how?). Could that be part of the problem? It causes trouble here at least... Maybe not such a good idea to mix perl and python?

akorosov commented 10 years ago
In [1]: from nansat import *

In [2]: s = Nansat('ncep_wind_online:201201011200', mapperName='ncep_wind_online')
GDAL could not open ncep_wind_online:201201011200, trying to read with Nansat mappers...
2012-01-01 12:00:00
NRT GRIB file not available: ftp://ftp.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.2012010112/gfs.t12z.master.grbf00.10m.uv.grib2
Downloading http://nomads.ncdc.noaa.gov/data/gfs4/201201/20120101/gfs_4_20120101_1200_000.grb2
missing wgrib inventory
No download! No matching grib fields
An exception has occurred, use %tb to see the full traceback.

SystemExit: No NCEP wind files found for requested time

To exit: use 'exit', 'quit', or Ctrl-D.

Obviously the mapper started to work but could not locate the file or something.

knutfrode commented 10 years ago

Ok, seems that sample time is not anymore available, but this is:

s = Nansat('ncep_wind_online:201210010600', mapperName='ncep_wind_online')

But in the meantime I found out that

from nansat.nansat import Nansat

works if it is moved from the top of the mapper modules, and into the "class Mapper" declaration in the same file. This will then solve the problem, but it is still a mystery why this works this way.

mortenwh commented 10 years ago

Aha - I see. It is because the mappers are imported in nansat.py. When you type from nansat.nansat import Nansat at the top you'll get a recursive import, and that does not work. See http://blog.notdot.net/2009/11/Python-Gotchas for explanation. The solution is to import Nansat as local to the Mapper class or __init__ method :)

knutfrode commented 10 years ago

Ok. Not allowing recursive imports makes sense, although the error message was not helpful at all ('cannot import name Nansat').

In the meantime, I will move imports of Nansat into VRT class declaration of the relevant mappers. In the long run, it would be good to find a solution how to allow such import at top of the mapper modules/files (standard location for imports).

mortenwh commented 10 years ago

Well - we can't change the way python works but one option is to move the line import mappers in nansat.py inside the method import_mappers. Then the mappers could have from nansat.nansat import Nansat at the top...

knutfrode commented 10 years ago

That sounds ok to me. Better to have irregularity (import inside method rather than at top of module) at one place, rather than spread over many mappers, possibly confusing mapper-makers.

mortenwh commented 10 years ago

yes, probably better... I'll do it :)

mortenwh commented 10 years ago

Now it works (see 34ba64bef014a9d7cf688a5a65de9952abdbd260), but I had to move the line nansatMappers = import_mappers() inside method _get_mapper as well.

knutfrode commented 10 years ago

Hm, something seems to have become broken by this. Tutorial now failes with error message:

File "/home/knutfd/software/nansat/nansat/nansat.py", line 306, in bands
    for iBand in range(self.vrt.dataset.RasterCount):
AttributeError: Mapper instance has no attribute 'dataset'
mortenwh commented 10 years ago

Can you give more info? This at least works for me:

from nansat.nansat import Nansat
s = Nansat('ncep_wind_online:201210010600', mapperName='ncep_wind_online')
s.bands()
knutfrode commented 10 years ago

Ok, problem seems to be related not to your latest update, but the fact that I at the same time introduced a user-defined mapper (wind_archive_local) outside of nansat/mapper folder. ~/mappers_user/nansat/mappers/mapper_wind_archive.py with an __init__.py file in the same folder.

But not a good sign that introducing user-mappers may crash Nansat also when using the built in mappers. It would be good if someone else could also try to add a user-defined mapper outside of the standard Nansat mapper directory, so that we can test this functionality.

letmaik commented 10 years ago

@knutfrode Did you put __init__.py in ~/mappers_user/nansat/mappers and also ~/mappers_user/nansat and added the two lines for namespace packages into both files?

knutfrode commented 10 years ago

Yes. The init.py files contain

from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)

Simply the presence of nansat folder within mappers_user/ (which is in pythonpath) results in:

>>>from nansat import *
>>>s = Nansat()
NameError: name 'Nansat' is not defined
knutfrode commented 10 years ago

Order of items in PYTHONPATH seems to make a difference: Moving Nansat to before the user mapper folder in PYTHONPATH makes it possible to import Nansat properly. But the user mapper is still not found.

letmaik commented 10 years ago

Well, the wild-card import from nansat import * is not working with namespace packages anyway, at least not without duplicating code. It's strange that the user mapper is not found, I'll test it myself in a few hours.

knutfrode commented 10 years ago

I am finally able to use a user-defined mapper, but then a new problem arise: As soon as there is a mapper in the user-specific folder, mapperName has to be given explicitly to use the built-in mappers in standard nansat folder. Otherwise only the user-defined mappers are tested.

letmaik commented 10 years ago

I can't reproduce your problem, I now have a user-defined mapper reacting on foo: prefix:

>>> nansat.nansat.Nansat('foo:lala', logLevel=10)
GDAL could not open foo:lala, trying to read with Nansat mappers...
02:50:26|10|nansat|_get_mapper|Trying mapper_aapp_l1b...
DEBUG:Nansat:Trying mapper_aapp_l1b...
02:50:26|10|nansat|_get_mapper|Trying mapper_aapp_l1c...
DEBUG:Nansat:Trying mapper_aapp_l1c...
02:50:26|10|nansat|_get_mapper|Trying mapper_amsr2_l3...
DEBUG:Nansat:Trying mapper_amsr2_l3...
02:50:26|10|nansat|_get_mapper|Trying mapper_asar...
DEBUG:Nansat:Trying mapper_asar...
[...]
02:50:26|10|nansat|_get_mapper|Trying mapper_viirs_l1...
DEBUG:Nansat:Trying mapper_viirs_l1...
02:50:26|10|nansat|_get_mapper|Trying mapper_foo...
DEBUG:Nansat:Trying mapper_foo...
ITS ME!
02:50:26|20|nansat|_get_mapper|Mapper mapper_foo - success!
INFO:Nansat:Mapper mapper_foo - success!
02:50:26|10|nansat|__init__|Object created from foo:lala
DEBUG:Nansat:Object created from foo:lala

When I try nansat.nansat.Nansat('ncep_wind_online:201210010600', logLevel=10) it correctly uses the built-in mapper.

letmaik commented 10 years ago

I think some confusion/errors can be eliminated by removing everything from the nansat/__init__.py file except the two lines for namespace packages and the pixel function registration. Defining names in here is unreliable anyway as it depends on the order of namespace package import. In Python 3.3 namespace packages won't have a __init__.py file anway anymore, see here: http://legacy.python.org/dev/peps/pep-0420/ I don't think any of the convenience imports are necessary, and it's probably better to be more explicit to make it easier for other developers to find the source files.

knutfrode commented 10 years ago

Hm, then we get different results. My user mappers are stored in a folder ~/mappers_nansat/nansat/mappers/ both this folder and the parent folder (nansat) contain the init.py as listed above. The main nansat folder is given in PYTHONPATH before this user-specific nansat-folder.

Is your setup different from this? Making similar test with foo-mapper, I do simply

>>> import nansat
>>> nansat.nansat.Nansat('foo:lala', logLevel=10)
letmaik commented 10 years ago

I had a different order in PYTHONPATH, but it still works for me. What's the output you get and what do you expect instead?

knutfrode commented 10 years ago

I get this:

>>> import nansat
>>> nansat.nansat.Nansat('foo:lala', logLevel=10)
GDAL could not open foo:lala, trying to read with Nansat mappers...
03:44:59|10|nansat|_get_mapper|Trying mapper_foo...
----------------------------------------
FOO!
----------------------------------------
03:44:59|20|nansat|_get_mapper|Mapper mapper_foo - success!
03:44:59|10|nansat|__init__|Object created from foo:lala 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/knutfd/software/nansat/nansat/nansat.py", line 229, in __repr__
    outString += self.list_bands(False)
  File "/home/knutfd/software/nansat/nansat/nansat.py", line 881, in list_bands
    bands = self.bands()
  File "/home/knutfd/software/nansat/nansat/nansat.py", line 306, in bands
    for iBand in range(self.vrt.dataset.RasterCount):
AttributeError: 'Mapper' object has no attribute 'dataset'

The crash is not important, I did not bother to make the foo mapper work properly. But the difference from your case is that the user mapper is tested first.

letmaik commented 10 years ago

Well, no, after I changed the order in PYTHONPATH it tested the user mappers first as well. But what's the problem with that? It doesn't matter I would say as long as the url/file can be opened? Or do you always want that the built-in ones are tried first? That would be tricky without introducing some kind of priority number within each mapper module.

akorosov commented 10 years ago

If we remove everything from the nansat/init.py then users one have to:

from nansat.nansat import Nansat
from nansat.domain import Domain

which is very awkward in everyday life. Why do you think it leads to confusion? Why is it unreliable? How these names depend the order of namespace package import?

letmaik commented 10 years ago

Try it out for yourself with the current code. Put nansat first, openwind second in PYTHONPATH and do a simple import nansat and then nansat.Nansat. In one of both cases it won't work. This is because names cannot be defined within a namespace package, except the modules themselves. Or rather, only the names of the __init__.py which gets run first (nansat or openwind) will be usable, which of course would lead to confusion.

akorosov commented 10 years ago

But if we remove all declarations from nansat/init.py it won't help then. It will anyway depend on the order of nansat and openwind in PYTHONPATH. I mean, if openwind is first in PYTHONPATH and if nansat/init.py doesn't have declaration of names then when we do import nansat it will be empty. (just tested)

letmaik commented 10 years ago

Sure it will be empty, it will only find modules and packages, nothing else. What you could consider is providing something like an official api module like here: https://github.com/brandon-rhodes/python-skyfield/blob/master/skyfield/api.py

akorosov commented 10 years ago

something like import nansat.api or from nansat.api import * ?

letmaik commented 10 years ago

more like from nansat.api import Nansat, Domain, VRT and what else people need. If you want you could support the wildcard import here, in fact I would support it, because then it's the user's choice and may be quite convenient in shell usage.

akorosov commented 10 years ago

That works well. However it is a little unusual...

Can't we make only the 'mappers' to be the namespace package? i.e. to place from pkgutil import extend_path; __path__ = extend_path(__path__, __name__) only inside ....../mappers/init.py ?

Or, alternatevly we can make folder nansat_mappers in parallel to nansat: /nansat --nansat ----init.py with all import and names ----nansat.py with Nansat ----... --nansat_mappers ----init.py with from pkgutil... ----mapper_asar.py

letmaik commented 10 years ago

Your first suggestion wouldn't work as all packages in the hierarchy have to be namespace packages, otherwise it stops looking at the first regular package.

The other suggestion is possible. It's similar to what matplotlib does with its mpl_toolkits namespace package (where matplotlib itself is a regular package). Maybe that's indeed a good way to go.

letmaik commented 10 years ago

With the second suggestion you could also easily solve your priority problem, that is testing built-in mappers first. For that you would keep nansat.mappers but make nansat and nansat.mappers regular packages and support nansat_mappers as a namespace for external mappers. EDIT: Of course, then the code would have to be slightly changed to look in both locations.

akorosov commented 9 years ago
Conclusion

We decided to use the name space package called nansat_mappers. Description is at WIKI now: https://github.com/nansencenter/nansat/wiki/How-to-create-a-mapper

knutfrode commented 9 years ago

Does this work for anyone?

As soon as I place a mapper in this separate user-mapper-folder, Nansat tests only this mapper, and not the built in ones.

knutfrode commented 9 years ago

Ah, my bad - I used an older mapper which did not raise WrongMapperError. After fixing that, it works fine.