ucd-cws / ca-naip

Indexes and Additional Information for California's National Agriculture Imagery Program (NAIP)
0 stars 0 forks source link

Build a Common index file #2

Open qjhart opened 9 years ago

qjhart commented 9 years ago

We need to have a nice index file that people can use to get at the data. We have had some good luck with KML, and I think that's what we need to move forward.

ghost commented 9 years ago
qjhart commented 9 years ago

I believe you already have a vector dataset of the DOQQ outlines with quadnames. I was thinking we could use gdaltindex to build index of the NAIP Tiff files and look for missing images (either visual inspection or possibly automated). However, for the boundary to use in the actual index vector file, you will want the same DOQQ boundary not the actual bounds of the image which will have some overlap.

I cannot currently download the KML files (permissions) but, as you said, we can rebuild these. However, it would be helpful to have a copy of the vector dataset of the DOQQ outlines with quadnames in some form. Also, on what web server are we hosting the pages that users will access? For right now, I'll work on a simple test on my laptop,

qjhart commented 9 years ago

@gwatprg yes, actually it's nice to have both, so you can sometimes decide you only need to download one image, not two. However, if you only need one, then it should be the regular gridded one. In the past, that was difficult as some were non-standard, but that's not the case anymore, so you can very easily make the boundary from the filename.

I have no problem downloading the KML from https://www.google.com/fusiontables/DataSource?docid=1cUrVwDup1lNBQ1_ljI3eHEOThi2lB3aITgm2ueQ#rows:id=1 as any user.

ghost commented 9 years ago

@qjhart By the way, I used ogr2ogr -f GeoJSON CA_NAIP2012.geojson "GFT:tables=1cUrVwDup1lNBQ1_ljI3eHEOThi2lB3aITgm2ueQ" to convert the Fusion table to geojson

ghost commented 9 years ago

@qjhart Just an update. I found where I read that the boundaries overlap - there is a 300-meter buffer around each DOQQ http://www.fsa.usda.gov/FSA/apfoapp?area=home&subject=prog&topic=nai I did a gdaltindex of just one directory, and the image files do indeed overlap.

ghost commented 9 years ago

@qjhart did a preliminary test building an index shapefile with attributes from a sample list of filenames. Will modify this. However, there are many ways I could envision creating the index file. For this test, I simply create a csv and use ogr2ogr to convert to a shapefile. I have also worked on a script to identify missing files. What is your preferred format for the index files? Is a Python script ok for building the geometries or would you prefer something else?

ghost commented 9 years ago

@qjhart while working on a script with the 2005 sample data, I noticed that some of the dates in the filenames are different than those downloaded from the NAIP index. For instance, from our 2005 files, we have n_3211419_ne_11_1_20050529.tif, n_3211419_nw_11_1_20050529.tif, n_3211420_ne_11_1_20050529.tif ... whereas the NAIP index lists n_3211419_nw_11_1_20050626_20060127.tif, n_3211419_ne_11_1_20050626_20060127.tif, n_3211420_ne_11_1_20050626_20060127.tif Ignoring the second date, which is the verification date, the first date should be the same, and is on most files. For some, like the ones above, it differs (i.e. 5/29/2005 ours 6/26/2005 NAIP index). I can ignore this in listing missing files and go by the rest of the attributes, but does it matter?

ghost commented 9 years ago

@qjhart I realize we have both dates in our ucd files. My missing script was picking up the date that was missing in the NAIP index, as it only has one date. So, my next question would be, I assume you want to use the later date, which I hope will always be the same as the NAIP index date, as the url for download and will proceed on that assumption.

qjhart commented 9 years ago

I envision a table with the columns similar to: https://www.google.com/fusiontables/DataSource?docid=1cUrVwDup1lNBQ1_ljI3eHEOThi2lB3aITgm2ueQ however with columns for every NAIP image that exists. In that framework, yes, you have missing data if it exists in one year but another, however if it exists in NO year it's not inluded.

ghost commented 9 years ago

@qjhart OK. What I am actually doing now is an additional step that we had talked about earlier, but perhaps I misunderstood what you said. It is already in the script now. I have downloaded the FSA NAIP indexes from http://www.fsa.usda.gov/FSA/apfoapp?area=home&subject=prog&topic=landing While they are not useful for our purposes, I can query the shapefiles and retrieve a list of files that we should have for that year. I then compare that list with a list I generate from what we have and see if we are missing any files for that particular year compared with what we should have.

qjhart commented 9 years ago

Even better. You need to capture that process in the project. I often use Makefiles for that.

ghost commented 9 years ago

@qjhart I have a shell script. I haven't put it in the repository yet, as I am still working on it. That part is working. I put a preliminary (raw) version in today a little later.

ghost commented 9 years ago

@qjhart I pushed an intermediate version to the repository with the script file, BuildQQ.sh and the python code that builds part of the index shapefile. Will work more tomorrow. Need to drop the date part and add each year's TIFF files. Note the color and date fields will be by year, so they will need to be named appropriately.

ghost commented 9 years ago

@qjhart I have it working for DOQQ's but will examine the files and test some more. It builds and index shapefile of a superset of all the year's DOQQ's, then adds on each years individual information (color, date, filename) in a loop. Fields names are constrained by the 10-char limit of shapefiles, but these can either be renamed or we certainly use a different display name (both for popup information and in the underlying shapefile). Will need to test some more, once we have more than 2014 data on the drive. The final shapefile for this run (with only 1 year's data, 2014, was NAIP_index1.shp under the indexes directory). I'll do some further checking that it is building the correct values; I only did a cursory check. Also, I need to change this around to accomodate building an index file for the CCM, assuming you want those. I also assume that we can simply use a county shapefile for the boundary, then build the list of the files and filenames by year and add the fields to the shapefile.

ghost commented 9 years ago

@qjhart Added a file for building the county mosaic index and slightly tweaked the other file (under the test subdirectory). Will need to test again, once we have data from all the years.

qjhart commented 9 years ago

@gwatprg Don't we have all the data, on some windows drives? I though George said they are all recovered. Also, let's talk with @gjscheer about how he wants a service setup for that.

ghost commented 9 years ago

@qjhart Right now the data is just residing on windows servers. Do you want me to mount the drives on your mac for readonly access?

ghost commented 9 years ago

@qjhart Do you have a lookup table for the DOQQ names? Otherwise, I'll write a script to create something for the DOQQs that we have in 2014.

ghost commented 9 years ago

@qjhart had problems using sqlite dialect on the mac Anytime I do anything with ogr sqlite dialect (even ask its version), it gives a segmentation fault. There are many threads on this, such as http://osgeo-org.1560.x6.nabble.com/gdal-dev-ogr2ogr-segfault-td5206348.html The same commands run on my windows ogr so I'll just put them in the code repository but run manually for now on my machine. ultimately we'll be running them on a new machine anyway and the url may need to be tweaked.

ghost commented 9 years ago

@qjhart first initial Fusion table. I set the info popup to show only some of the fields. Don't have time to look at this right now. Note the files are still copying. I think only some of the TIFFs by Tijuana have copied. sent you the link.