shawnlaffan / biodiverse

A tool for the spatial analysis of diversity
http://shawnlaffan.github.io/biodiverse/
GNU General Public License v3.0
75 stars 19 forks source link

Basedata - allow import from raster and shapefile formats #408

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The system currently only imports data using delimited text files.  However, 
much of the data used in Biodiverse are originally provided in raster formats.  

It would be helpful to directly import such data, starting with asciigrid 
rasters.  

Original issue reported on code.google.com by shawnlaffan on 29 Oct 2013 at 10:48

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1921.

Add more tests for the basedata import process.  
Abbreviate some of the code to be more perlish.
Also croak when numbers of elements in cell_size and group_cols arguments (or 
parameters) don't match.  cell_origins are automatically set to 0 if not 
specified, croaking if their element counts also don't match the cell_sizes.

Original comment by shawnlaffan on 30 Oct 2013 at 3:40

GoogleCodeExporter commented 9 years ago
branch for this issue was created in r1920

Original comment by shawnlaffan on 30 Oct 2013 at 3:47

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1922.

Factor out the post-import processing steps into their own sub.  This means we 
can leave the delimited text imports alone for now.  

Original comment by shawnlaffan on 30 Oct 2013 at 4:48

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1923.

Cell sizes must now be specified when the basedata object is created.  This 
avoids ambiguity later on.

Original comment by shawnlaffan on 30 Oct 2013 at 5:44

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1924.

Refactor more of the basedata import checking code.  Push it into the new() sub 
where it is best utilised.  

Original comment by shawnlaffan on 30 Oct 2013 at 6:14

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1926.

Update the GUI code to handle the revised importation process.  
The system is almost ready for the raster import code to be implemented.

Original comment by shawnlaffan on 30 Oct 2013 at 11:58

GoogleCodeExporter commented 9 years ago
Add target milestone

Original comment by shawnlaffan on 31 Oct 2013 at 1:58

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1936.

Push changes across to the trunk, as they are generic to this point.

Original comment by shawnlaffan on 31 Oct 2013 at 5:16

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1955.

Rebase on trunk

Original comment by shawnlaffan on 9 Nov 2013 at 9:54

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1960.

Add round-trip tests for three of the four delimited text structures.  This can 
be used as a model for the raster imports as making them round-trip safe is a 
good test of them working.
Basetruct exports now also accept the sub name as the format argument.
NOTE:  Labels also now have their outer quotes trimmed if there is only one 
label column used.  This is needed to ensure the round-trip gives the same 
labels on import.

Original comment by shawnlaffan on 10 Nov 2013 at 4:04

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1962.

Add example raster exports and the basedata from whence they came

Original comment by shawnlaffan on 12 Nov 2013 at 1:10

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1972.

Added methods for importing using GDAL library.  I've tested it on linux/gentoo 
using gdal-1.9.2, built with perl bindings.

Added import_data_raster in BaseData.pm, based on import_data.  This reads data 
from gdal files ok, however processes for reading/setting parameters will need 
checking.

Changed import dialog GUI to allow file type to be defined when loading (text 
vs raster), and additional parameters added in parameter dialog to allow 
setting of cell origins & sizes.  Choosing labels as bands vs cell values 
addressed as a parameter set in the dialog.

Original comment by anthony....@gmail.com on 14 Nov 2013 at 5:06

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1973.

Add notes page to the wiki for installation of GDAL

Original comment by shawnlaffan on 14 Nov 2013 at 6:11

GoogleCodeExporter commented 9 years ago

Original comment by shawnlaffan on 14 Nov 2013 at 6:19

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1974.

Add --no-version-check option since cpan uses 1.9.0 and the ubuntu version is 
1.9.2.

Original comment by shawnlaffan on 14 Nov 2013 at 6:21

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2076.

Including import of shapefiles.  Identifying format of shapefile according to 
available fields, and integrating with column identification used by text 
column import.  (shapefile import incomplete)

Original comment by anthony....@gmail.com on 22 Jan 2014 at 1:03

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2078.

Added methods to read shapefiles, appears to be working on examples tested.  
There is a bit of an issue with setting CELL_SIZES, the value appears to be 
stored redundantly under Groups, so when the value in the basedata_ref of the 
shapefile data is updated, it doesn't update the value used for displaying 
(which remains at the default).  I changed the default to 100000 just to test 
the coastline shapefile.

Handling labels is done a bit differently to column data, although the code is 
reused, i've made it lenient about choosing a label column, to give the option 
of using a constant label (1), eg to use data simply with X,Y 'columns'.  It 
should handle Z,M and allow arbitrary groups/columns.

Added PPD for GDAL on windows, compiled using MinGW 64 and build against 
Strawberry Perl (5.16).  This was extremely difficult to get it to build, 
however a re-build on the 32 bit toolchain (and 32bt strawberry perl) should be 
possible, and it builds with 5.18 as well.  Its probably worth adding the 
modified source to the repository to allow rebuilding.

Original comment by anthony....@gmail.com on 24 Jan 2014 at 10:23

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2101.

Style changes to conform to rest of project (need to document the style).  
Change tabs to spaces, avoid cuddled else blocks, use single quotes for 
non-interpolated strings.

Original comment by shawnlaffan on 1 Feb 2014 at 3:38

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2102.

Comment out call to $params - was causing exceptions, and was redundant in any 
case.

Original comment by shawnlaffan on 1 Feb 2014 at 4:24

GoogleCodeExporter commented 9 years ago
This is progressing well.  One point to add is: 

Need to add option for empty groups.  Currently all groups in the raster are 
added to the basedata.  

The default for rasters should be to skip nodata cells unless told otherwise. 
This will match text file import which only imports valid records.  

Original comment by shawnlaffan on 1 Feb 2014 at 4:27

GoogleCodeExporter commented 9 years ago
Also need to add tests for raster and shapefile imports.  

Original comment by shawnlaffan on 1 Feb 2014 at 4:28

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2103.

More stylistic changes, as per previous commits on this branch.  

Original comment by shawnlaffan on 1 Feb 2014 at 7:11

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2104.

Skip cells with nodata - these can be enabled later when we allow empty groups.
Also clear up more tabs and other style points.  

Original comment by shawnlaffan on 1 Feb 2014 at 10:52

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2106.

tidying up, platform independant paths etc

Original comment by anthony....@gmail.com on 3 Feb 2014 at 8:03

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2107.

Use hash to handle category names. 

Original comment by shawnlaffan on 4 Feb 2014 at 1:40

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2108.

Category names are zero based, and use a zero length string if not set.  Adjust 
code to match this.
Also a few style edits.

Original comment by shawnlaffan on 5 Feb 2014 at 2:34

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2109.

Clear out the redundant swapped="no" text which glade puts in.  

Original comment by shawnlaffan on 5 Feb 2014 at 3:01

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2110.

Merge trunk changes across.  
Had to do plenty of conflict resolution, so hopefully the testing picked 
everything up.

Original comment by shawnlaffan on 5 Feb 2014 at 3:51

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2111.

Add shapefile version of the example data.

Original comment by shawnlaffan on 5 Feb 2014 at 3:53

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2112.

Open GDAL files in read mode.  ArcInfo rasters won't work for update mode.  
Also adjust the progress dialogue to update more effectively (y-axis repeats 
for blocks so it wasn;t being updated).

Original comment by shawnlaffan on 5 Feb 2014 at 10:04

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2115.

Merge latest trunk revisions.  

Original comment by shawnlaffan on 6 Feb 2014 at 3:10

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2116.

round-trip tests of ascii raster format.  fixed import, reading cell positions 
appropriately, passes tests now.  labels for each band are grabbed from file, 
according to use of parameter given_label=>1.

Original comment by anthony....@gmail.com on 6 Feb 2014 at 4:59

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2119.

Get the shapefile field names in file order.  
Give the shapefile coords distinct names so they don't clash with any field 
names.  The leading colon is an invalid field character so ensures 
distinctness.  

Other issues remaining:
The shapefile import seems to be using column names, whereas the text import 
uses column indexes.  Names are a useful thing, but we should map column names 
to column ids to keep things consistent.  In the shapefile case this gets a bit 
hairy, as the x, y, z & m will be at the front of the array, so any scripting 
interfaces will need to check for the z & m fields.  Or we can only enable them 
when asked.

Original comment by shawnlaffan on 6 Feb 2014 at 10:49

GoogleCodeExporter commented 9 years ago
Add Anthony to the CC list.  

Original comment by shawnlaffan on 6 Feb 2014 at 10:50

GoogleCodeExporter commented 9 years ago
Add shapefile to the branch name, as all that development has been on this 
branch.  

Original comment by shawnlaffan on 6 Feb 2014 at 10:54

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2120.

Simplify the arguments hash usage.  
Clear out some more tabs.  

Original comment by shawnlaffan on 6 Feb 2014 at 11:01

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2121.

Adding handling of roundtrip for shapefile export and import.  Export uses an 
argument shapetype to choose polygon or point export (2D) via 
shapefile::writer.  Label and count are exported using dbf fields, and a new 
shape is created for each label in the bin.

Import takes an argument use_dbf_label to use label, count symbols in the dbf 
fields.  It is currently using a new shape for each label, I had written code 
to package multiple labels in a shape, however this can be removed if the 
multiple-shape approach is working fine.

Its been tested on the standard round-trip test, I believe that only uses one 
label per bin so some additional tests will be useful.

Original comment by anthony....@gmail.com on 10 Feb 2014 at 6:51

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2122.

Rename the gdal win folders to align with the naming scheme for the gtk 
folders.  
This also simplifies the path search code in Biodiverse::GUI::GUIManager.pm.  

Original comment by shawnlaffan on 13 Feb 2014 at 1:02

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2123.

Search for the gdal bin paths to path under windows, in the same way as the gtk 
paths.  

Original comment by shawnlaffan on 13 Feb 2014 at 1:22

GoogleCodeExporter commented 9 years ago
Need to sort this warning out.  It triggers when the import dialogue is called. 
 The txt/raster/shp combo is probably the root cause.  Putting it higher in the 
window would be a good approach, and easier for the user as well.  

Gtk-WARNING **: Only 'activatable' widgets can be packed into the action area of
 a GtkDialog at C:\issue_408_import_rasters\lib/Biodiverse/GUI/BasedataImport.pm line 1017.
Gtk-WARNING **: Only 'activatable' widgets can be packed into the action area of
 a GtkDialog at C:\issue_408_import_rasters\lib/Biodiverse/GUI/BasedataImport.pm line 1017.

Original comment by shawnlaffan on 13 Feb 2014 at 1:26

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2124.

Shift the gdal and gtk search into Biodiverse::Config, as we need gdal for 
Biodiverse::BaseData

Original comment by shawnlaffan on 13 Feb 2014 at 3:12

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2125.

Clean up and refactor some code.  
Shapefiles now use the column names to determine which columns are used.  This 
makes it easier to use the shape coords and fields in any combination. 
Shapefile imports now support sample counts, in the same way as the text 
imports do.
Set Geo::ShapeFile minimum version to 2.54.
Basedata::run_import_post_processes now determines the number of label axes 
from the first label.  This makes the post-processing much simpler and less 
argument dependent.

Shapefile and raster imports still need to handle group and label property 
setting (labels and groups can be renamed at import).
Shapefile imports also need to handle include and exclude columns.  

Original comment by shawnlaffan on 13 Feb 2014 at 3:34

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2126.

Modify test arguments so they pass once more.  

The label and group import handling needs a major refactor, as there is 
repetition throughout.  

Original comment by shawnlaffan on 13 Feb 2014 at 3:44

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2144.

rearranging import routine, all import methods run through length of run() 
subroutine, and handle properties import

Original comment by anthony....@gmail.com on 18 Feb 2014 at 5:37

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2145.

Fix some test failures.  
The refactoring in r2125 did not correctly calculate the label cell sizes.
The shapefile exports now need to be given a list argument if the user wants a 
list (e.g. SUBELEMENTS) to be exported. It needs to be generic.  The GUI needs 
to pass this onwards.
Also clean up some tabs in the test files.  

Original comment by shawnlaffan on 20 Feb 2014 at 11:08

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2146.

Fix some further test failures.

The refactoring in r2125 did not check for basedatas with no labels.  In these 
cases assume we have one axis.  This avoids some uninitialised value warnings.

Original comment by shawnlaffan on 23 Feb 2014 at 10:56

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2147.

Let Biodiverse::Config get the Gtk and GDAL bin paths.  

Original comment by shawnlaffan on 23 Feb 2014 at 10:57

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2148.

Merge recent trunk revisions across.  

Original comment by shawnlaffan on 23 Feb 2014 at 11:14

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2149.

Add nodata option to the shapefile exports.  
Add __no_list__ option to the shapefile exports for GUI purposes.
Change field names to KEY and VALUE to be more generic.  
Add set_cached_values sub.  This is just a wrapper around set_cached_value.  

Original comment by shawnlaffan on 24 Feb 2014 at 10:16

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2150.

Make it clear in the metadata that we do not support export of array lists to 
shapefiles.  

Original comment by shawnlaffan on 24 Feb 2014 at 9:42