spectralpython / spectral

Python module for hyperspectral image processing
MIT License
571 stars 139 forks source link

Add support for USGS spectral library #109

Closed kormang closed 4 years ago

kormang commented 4 years ago

Hi, again.

Again, I was working with USGS spectral library, then decided to adopt code for this library and propose adding it to this library.

I've tried to make it close to AsterDatabase as much as possible, but there are few differences. Samples don't have X data associated with them, instead it is stored separately. Also many other meta data are missing, and there are some new. I don't think there is need to two tables for samples in this case. USGS is already complicated enough. It was really difficult to make it work, since, it does not seem like authors cared about automated processing. I haven't included get_signature method since in this case return value of that method will have only name as additional valuable information. All samples have chapter and file name included, because it is useful to find additional meta data and descriptions, and GIF plots based on chapter and file name. Chapter is also generally useful.

kormang commented 4 years ago

I will add create_envi_spectral_library if initial feedback is generally positive, and there is will to include this into the package. :)

kormang commented 4 years ago

I will reduce size of test files - test data, but not their number.

tboggs commented 4 years ago

Which USGS library data does this code read? Is it different data than what is included in the ECOSTRESS library?

kormang commented 4 years ago

Here is the full story. I've downloaded USGS data from here https://crustal.usgs.gov/speclab/QueryAll07a.php I have also requested all vegetation from ECOSTRESS.

I've read that ECOSTRESS includes USGS. But on their web site in only says that it includes data from USGS, but it does not say all data. It says that it is developed as part of ASTER and ECOSTRESS projects, and also asks to cite two publications, related to ASTER (2009) and ECOSTRESS (2019). While on USGS web site they are pointing to publication related only to USGS spectral library (2017).

It seems that not everything is included in ECOSTRESS. In particular, _Grassdry.4+.6green AMX27 BECKa AREF is sample that I could not find in ECOSTRESS. Besides, these libraries include very different meta data and additional data. USGS is much richer in descriptions, meta data, and also includes GIF plots and things like that. Although not all of this data is not part of ASCII data that this code processes. It is also organized differently. USGS also includes spectra already convolved to different spectrometers and resampled to multispectral spectra compatible to e.g. Landsat. We can not say that USGS ⊂ ECOSTRESS. But it seems that we can say USGS ∩ ECOSTRESS ≠ ∅.

Pretty sure I'm not the only one who would find this module helpful. :)

tboggs commented 4 years ago

This looks like a reasonable addition to me. A few comments/questions:

kormang commented 4 years ago
  1. Same with me and USGS. Still meta data are different, and generally libraries are different.

  2. Links are correct. Sometimes they fail. Sometimes they work from one computer but not the other. I wrote them and complained about it, but it seems they haven't fixed it. You can probably open home page, there is link to data there. But that is the link I have provided, I doubt that it would help, but you can try. As I'm writing this, I'm able to open it from two computers using Firefox, and Chromium on one computer. But I also had problems before.

  3. In the publication that I have cited in the source code, there is sentence:

Although this information product, for the most part, is in the public domain, it also may contain copyrighted materials as noted in the text. Permission to reproduce copyrighted items must be secured from the copyright owner.

As far as I could understand reading it, it might refer to specific signatures of specific materials and data about spectrometers. Any way, I've changed test data not to include real data. I've changed library names, chapter names, material names, measurements (there is also no more then 45 lines per file), only names of spectrometers are the same, as that is important for tests.

  1. I've already changed that. :)
kormang commented 4 years ago

Now I can't open the link. Really strange. Maybe you can now. :)

kormang commented 4 years ago

Added create_envi_spectral_library.

kormang commented 4 years ago

Till now, I never ran it against whole USGS dataset. There were some problems. Sample files are relatively uniform, but others are not. So I had to open almost all of them (128) and make manual adjustment to the code so that it can parse all of them properly. Also if importing one file fails it won't affect importing of the whole dataset. Now I'm grad to see how it works.

Imported 71249 sample files and 128 spectrometer files. 0 failed sample files, and 0 failed spectrometer files.

It takes a long time, but its one time jobs, and it works.

I was also able to export it to ENVI spectral library in one go.

-rw-rw-r-- 1 kormang kormang 2.0M Jun 15 07:54 envilib.hdr
-rw-rw-r-- 1 kormang kormang  60M Jun 15 07:54 envilib.sli
-rw-r--r-- 1 kormang kormang 154M Jun 15 07:28 usgs.db

It also takes a long time, but it works.

Checked sanity with queries like this: db.print_query('SELECT DISTINCT Name FROM SpectrometerData')

Also description now includes everything after record. That is because it is impossible to tell what should be included in description for spectrometer data. For samples we could tell, but to make it consistent and user still might find it useful to include everything, although some data is redundant (repeated in columns Purity, Spectrometer and MeasurementType).

kormang commented 4 years ago

I'm sorry for making so much updates. Now I'm happy about it. PTAL.

tboggs commented 4 years ago

No worries. This all looks fine to me. Let me know when you're ready to merge.

kormang commented 4 years ago

Hi.

I've been testing it for a few days. As I said, I was able to import, literally all USGS data, and export it to ENVI spectral library. Data looks, OK, and it quite comfortable to work with.

I'm ready for merge.

tboggs commented 4 years ago

@kormang It appears this PR is failing tests with python 2.6. Can you take a look at this?

https://travis-ci.org/github/spectralpython/spectral/jobs/699899538

kormang commented 4 years ago

Sorry for late reply. I will take a look. I've never run in with python 2.6, only 2.7.

kormang commented 4 years ago

Oh, python 2.6 requires indices in format string. I will fix it immediately.