ngds / ckanext-geoserver-bku03232018

5 stars 6 forks source link

WFS validation errors, testing instance. #7

Closed smrgeoinfo closed 9 years ago

smrgeoinfo commented 9 years ago

new NGDS CKAN/GeoServer WFS for testing at http://uat-ngds.reisys.com/geoserver-srv/BoreholeTemperature/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=BoreholeTemperature:BoreholeTemperature&maxFeatures=10 the WFS response causes a validator 'unhandled exception' with the wfs validator at http://schemas.usgin.org/validate/wfs.

(with schema locations: xsi:schemaLocation="http://www.opengis.net/wfs http://schemas.opengis.net/wfs/1.0.0/wfs.xsd http://stategeothermaldata.org/uri-gin/aasg/xmlschema/boreholetemperature/1.5 http://schemas.usgin.org/files/borehole-temperature-observation/1.5/BoreholeTemperature.xsd "

Validation problems depend on the validation engine used. I'm not sure what engine the online WFS validation tool uses.

CRITICAL: Validating with Saxon-EE the only error that gets called out is BoreholeTemperature:wellname_ is an invalid element name. Should be BoreholeTemperature:WellName This suggests a problem in the CSV to PostGIS data loading.

Validation using Oxygen XML editor v14.2., with the Xerces validation engine...

validation errors are getting thrown on empty date values that have data type xs:dateTime (e.g. SpudDate, ReleaseDate) or xs:double. Not sure why the empty xs:string elements are ok but not the empty xs:dateTime elements. Is there a way to have geoserver not insert the empty optional elements

Schema problems (relative to normative schema for BoreholeTemperature at http://schemas.usgin.org/files/borehole-temperature-observation/1.5/BoreholeTemperature.xsd (this is what the validator uses I think; @jessica-azgs can you verify?). Exception may be due to validation problems. Here they are: gml:boundedBy -- gml:null is invalid element name, should be gml:Null -- its case sensitive. This may be a geoserver problem? Note that the saxon parser thinks gml:null (l.c.) is OK. Go figure...

'fid' attribute on BoreholeTemperature:BoreholeTemperature is not allowed. Might be a geoserver problem (some argument in calls to deploy service?)

BoreholeTemperature:wellname_ is an invalid element name. Should be BoreholeTemperature:WellName This suggests a problem in the CSV to PostGIS data loading.

jessicagood commented 9 years ago

When using http://schemas.usgin.org/validate/wfs one problem is that it is constructing the wrong wfs request based on what it is pulling out of the Get onlineResource element from the GetCapabilities doc. It is constructing the url as

http://uat-ngds.reisys.com:8080/geoserver/BoreholeTemperature/wfs?&service=WFS&version=1.0.0&request=GetFeature&typename=BoreholeTemperature:BoreholeTemperature&maxfeatures=1

but it should be

http://uat-ngds.reisys.com/geoserver-srv/BoreholeTemperature/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=BoreholeTemperature:BoreholeTemperature&maxFeatures=1

smrgeoinfo commented 9 years ago

In the capabilities document for the service (e.g. http://uat-ngds.reisys.com/geoserver-srv/BoreholeTemperature/ows?service=WFS&version=1.0.0&request=GetCapabilities), the URLs for the online resource attributes for each request (xpath is /WFS_Capabilities/Capability/Request//DCPType/HTTP/Get/@onlineResource) need to have the http://uat-ngds.reisys.com/geoserver-srv/ host URL, not the http://uat-ngds.reisys.com:8080/geoserver host URL.

ALSO, the validation component needs to access the service URLs from the capabilities document @jessica-azgs can you see if this is actually the case?

smrgeoinfo commented 9 years ago

POSSIBLE FIXES:

  1. if geoserver has method to configure to use a reverse proxy, then this is the best. the INstall script would have to configure the Geoserver instance on the node to use the reverse proxy, and then when services are deployed the reverse proxy URL would automatically be inserted in the capabilities document.
  2. Service deployment process would have to write a custom capabilities document with the correct request URLs and link that to the deployed service. Ideally this could be supported through GeoServer API; if not it becomes a complex problem.
JihadMotii-REISys commented 9 years ago

@smrazgs @jessica-azgs I've found a way to fix it. The geoserver does provide a proxy base url for this purpose.

screen shot 2015-01-02 at 9 50 13 am

Can you please re-test WFS validation ?

Thanks

Lbookman commented 9 years ago

@JihadMotii-REISys the WFS doesn't validate,

Error Element 'BoreholeTemperature:BoreholeTemperature', attribute 'fid': The attribute 'fid' is not allowed. Error Element 'BoreholeTemperature:wellname_': This element is not expected. Expected is one of ( BoreholeTemperature:WellName, BoreholeTemperature:APINo, BoreholeTemperature:HeaderURI ).

JihadMotii-REISys commented 9 years ago

@smrazgs @jessica-azgs Regarding "Error Element 'BoreholeTemperature:wellname': This element is not expected. Expected is one of ( BoreholeTemperature:WellName, BoreholeTemperature:APINo, BoreholeTemperature:HeaderURI )." => The problem was that the file that I used as resource has a column name "ObservationURI,WellName ,APINo,HeaderURI, ..." that contains an empty space and when I upload this file to datastore, this empty space converts to "". http://uat-ngds.reisys.com/geoserver-srv/BoreholeTemperature/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=ckan:BoreholeTemperature&maxFeatures=10

jessicagood commented 9 years ago

@JihadMotii-REISys Can you manually remove the trailing spaces for now in Well Name and Production and continue on with the other issues in getting the WFS to validate? I think I'll need to make a tweak to usginmodels so a file which has extra spaces in field names won't validate on upload. This is an outlier case.

JihadMotii-REISys commented 9 years ago

@jessica-azgs Sure Jessica, I've done removing the trailing spaces and I ran WFS validation and it seems there is more other issues such: Error Element 'BoreholeTemperature:ElevationDF': '' is not a valid value of the atomic type 'xs:double'. Error Element 'BoreholeTemperature:pH': '' is not a valid value of the atomic type 'xs:double'. Error Element 'BoreholeTemperature:CirculationDuration': '' is not a valid value of the atomic type 'xs:double'. ..... Are these an issues for validation ? because it looks it's coming from the content of the file, right ?

@jessica-azgs @smrazgs @Lbookman Regarding the FID, it's an important field and It seems that I can't get rid of it in geoserver (http://osgeo-org.1560.x6.nabble.com/hide-the-field-FID-to-Openlayers-td3793997.html). However, I noticed one thing, that is when I changed the version of WFS from 1.0.0 to 1.1.0, the fid changed to gml:id and other structures changed as well. e.g: http://uat-ngds.reisys.com/geoserver-srv/BoreholeTemperature/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=ckan:BoreholeTemperature&maxFeatures=10 But when I tried to validate WFS with 1.1.0 version, the WFS application validator throws an exception. Also, the other sources they're using 1.1.0 http://geothermal.isgs.illinois.edu/ArcGIS/services/aasggeothermal/NJBoreholeLithIntervals/MapServer/WFSServer?request=GetFeature&service=WFS&TypeName=BoreholeLithInterval When I validated this one http://geothermal.isgs.illinois.edu/ArcGIS/services/aasggeothermal/NJBoreholeLithIntervals/MapServer/WFSServer?request=GetCapabilities&service=WFS The validation succeed.

jessicagood commented 9 years ago

@JihadMotii-REISys The list of fields you're getting as not having the valid type are all empty fields so likely PostgreSQL is making the type for these fields as string. Then Geoserver does not know that these fields (even though they are empty) must have a specific type (as stated in the schema http://schemas.usgin.org/files/borehole-temperature-observation/1.5/BoreholeTemperature.xsd) or the WFS won't validate. We need to find a way around this.

jessicagood commented 9 years ago

@JihadMotii-REISys This was addressed in the spring in https://github.com/ngds/ckanext-ngds/issues/377 but I don't think it was ever solved.

JihadMotii-REISys commented 9 years ago

@smrazgs @jessica-azgs, regarding the fid error, I ran a validation of an existing WFS (I used 1.0.0 instead of 1.1.0) http://geothermal.isgs.illinois.edu/ArcGIS/services/aasggeothermal/NJBoreholeLithIntervals/MapServer/WFSServer?request=GetCapabilities&service=WFS&version=1.0.0 the output for the v1.0.0 was => FID error too ... but when I ran the validation again for the same WFS link with version=1.1.0, the validation succeeded. is the v1.1.0 compatible with ckan dependencies/external APIs ? if yes, Shouldn't we use the WFS V1.1.0 instead of 1.0.0 ?

jessicagood commented 9 years ago

@JihadMotii-REISys Yes, I think we should be using 1.1.0 since all the current WFS uses 1.1.0. I'm not sure how we got started with 1.0.0.

JihadMotii-REISys commented 9 years ago

@jessica-azgs @smrazgs I'll change the version in code to 1.1.0. However, this changes it won't fix the other empty fields as mentioned in ngds/ckanext-ngds#377. we still need to find a way around this separately.

JihadMotii-REISys commented 9 years ago

@smrazgs @jessica-azgs @Lbookman I have deployed the new code for using WFS 1.1.0 in UAT server. http://uat-ngds.reisys.com/dataset/indiana-borehole-temperatures-wfs-1-1-0 http://uat-ngds.reisys.com/dataset/well-log-ar-wfs-1-1-0-0

I ran a WFS Validation only for WellLog and it succeeded. Here is the link: http://uat-ngds.reisys.com/geoserver-srv/WellLog/ows?service=WFS&version=1.1.0&request=GetCapabilities&typeName=WellLog:WellLog I guess the file used for WellLog has no empty fields, However i ran this just to make sure that the version 1.1.0 is the correct one.

JihadMotii-REISys commented 9 years ago

same as #12

ccaudill commented 9 years ago

1/20 email: Jihad, Attached is a valid Borehole Temperature file, and the WFS service associated with it is http://services.azgs.az.gov/arcgis/services/aasggeothermal/AZBoreholeTemperatures/MapServer/WFSServer?request=GetCapabilities&service=WFS. But again, if the data types are not being populated correctly in PostGIS, the WFS services are NOT going to validate (unless the field types are guess correctly, which can occasionally happen). The data types can be found in the schemas for the given models here: http://schemas.usgin.org/models/ and the BoreholeTemperatures model is here: http://schemas.usgin.org/files/borehole-temperature-observation/1.5/BoreholeTemperature.xsd where the data types are given as the element name xs type for each field:

        <xs:element name="BoreholeName" type="xs:string" minOccurs="0">
                                                                    <xs:annotation>
                                                                                <xs:documentation>The human-intelligible name of the borehole identified by the HeaderURI.</xs:documentation>
                                                                    </xs:annotation>
                                                        </xs:element>

These are translated to string=text; double=decimal; dateTime=calendarDate in PostGIS. As you requested, here is the free application which you can use to validate WFS services alongside the WFS validator at http://schemas.usgin.org/validate/wfs: Free versions of XML Explorer and Notepad ++ available here: http://xmlexplorer.codeplex.com/, http://notepad-plus-plus.org/ a. Create a Get Feature Request: In the browser, change the WFS GetCapabilities URL by deleting “Capabilites” and replacing with “Feature” and adding to the end of the URL the layer name. This is shown below in bolded text: http://services.azgs.az.gov/ArcGIS/services/aasggeothermal/AZActiveFaults/MapServer/WFSServer?request=GetCapabilities&service=WFS http://services.azgs.az.gov/ArcGIS/services/aasggeothermal/AZActiveFaults/MapServer/WFSServer?request=GetFeature&service=WFS&TypeName=ActiveFault&MaxFeatures=2 c. Copy this URL. In XML Explorer “Open Url …”. Paste in the Url. Once loaded save the file. Open in Notepad++. d. From http://schemas.usgin.org/models/ save the schema (.xsd) to validate file against to a local file location. e. In Notepad++:

  1. In the first element, in the attribute “xsi:schemaLocation=”, find the 2nd URL that appears (will end with the layer name) and delete. In its place, write the name of the .xsd saved in step d above (i.e., ActiveFault1.1.xsd).
  2. Save the document as .xml and having the same name and in the same location file location as the .xsd saved in step d (i.e., ActiveFault1.1.xml). Close. f. Open XML Explorer; Click Open and navigate to the xml file that was just edited in step e. i. Upon opening this .xml file, the program will look at the indicated schema to automatically validate the file.
  3. Any returns with a red X will mean a change in the data, field type, or field heading is needed, and likely a re-import of the Feature Class into a staging or deployment database.
  4. A return of no errors indicates that the .xml file (and hence the GetFeature request and the service) was successfully validated. -Christy

email 1/23 Maybe http://schemas.usgin.org/contentmodels.json is useful. It gives all of the models and their URIs. This is what usginmodels uses. The Readme for usginmodels states how to use the functions and gives example URIs as well. For example, to get an object with all the info about a particular model use usginmodels.get_model(uri). Does that help? -Jessica

@JihadMotii-REISys can you tell me where we are on this? Is there anything further ready for us to test? (Let's try to keep comments on GitHub if possible instead of emails.)

ccaudill commented 9 years ago

Email 2/5: Hi Christy I was able to upload and validate custom Borehole CSV file (See attached). UAT link - http://uat-ngds.reisys.com/dataset/test-borehole Essentially I updated missing missing fields with new test values based on the schema (http://schemas.usgin.org/models/#boreholetemperature) Now that this file is validated … it confirms my theory of Postgres column types. There are few of ways to move forward on this

  1. Hook in to datapusher code and set the correct column type. But this can be time consuming especially since we have about 35 models plenty of columns.
  2. We add dummy row with correct data types on top of every uploaded CSV which ensure datastore column types.
  3. We use our first validator to update first row of uploaded CSV in such way that it is compliant with schema. To me it seems like 3rd option is most appropriate as we are already correcting CSV file there. Regards, Yatin Khadilkar

Great, thanks Yatin. Might the last 2 ways be problematic; at what point does that dummy data get removed? If it exists in the PostGIS database, won’t it be pushed on to GeoServer for publication? For the 3rd option, is this something that you need help from Jessica to work on, or do you feel like you can take that on? My preference is the first option, as it seems like it will be the most solid way to ensure that PostGIS always has the correct data types, aside from any changes that the pre-validator might undergo. What might be your estimate for how much longer this option would take? Thank you, Christy

For 3rd options I was thinking of adding data in only first row of empty columns. Hopefully inconsequential data for example BoreHoleHeight can be 0.0000001. Advantage of this approach over first one is that we wont be customizing CKAN's DataPusher extension. So we can keep getting support from upstream changes. For 3rd option since Jessica did most of the development for the validator I was hoping she can work on that. Having said that first option is definitely more robust, I will try to explore if we can extend data-pusher extension without customizing it heavily. Regards, Yatin Khadilkar

@ykhadilkar-rei Great - let me know. It's looking like 1 might still be the best option. Be aware that in our NGDS specs, we replace the null number (double) values with "-9999" instead of "0.0000001". @smrazgs Could you comment on which of the 3 possible solutions Yatin indicates might make most sense to you?

ccaudill commented 9 years ago

@ykhadilkar-rei Steve and I spoke and agreed that 1 would be the best option - perhaps we can chat further about this on our weekly telecon tomorrow.

ccaudill commented 9 years ago

Notes from 20150210: WFS - extending the datapusher approach (option 1) - will be changing the stock CKAN extension. It is actually sloppy to hard-code that in, it can change a lot and will change a lot. It's not sustainable. The csv files are being modified anyway and that is probably the best choice. First validation, checks the fields. Updating the first row of numbers (double) and date fields. Will look into it and let us know.

@ykhadilkar-rei Jessica's code with this tool https://github.com/usgin/ExcelToNGDSServiceTool already does this to some extent - perhaps it would be helpful? @jessica-azgs

ykhadilkar commented 9 years ago

@ccaudill thanks for for the tool url ... I was thinking of same. I will start working on it next week.

ccaudill commented 9 years ago

any update on this task? thank you

ykhadilkar commented 9 years ago

@ccaudill looks like to install ExceltoNGDSServiceTool we need ArcGIS. @jessica-azgs is that correct? Is there any other way to develop and test?

ccaudill commented 9 years ago

I would contact @jessica-azgs, yes.

jessicagood commented 9 years ago

@ykhadilkar-rei The ExceltoNGDSServiceTool uses the usginmodels API and works with ArcGIS. I don't see why you'd need to look at this tool at all. What you need is in usginmodels. If you use the get_layer method with a specified schema it will return an object with all of the layer information, including the field type for each field. Once you know the field type you should be able to set default values. get_layer returns the information for a single layer but get_models will return all layers for all models.

layer = usginmodels.get_layer("http://stategeothermaldata.org/uri-gin/aasg/xmlschema/activefault/1.1")

capture

ykhadilkar commented 9 years ago

@jessica-azgs thanks for the information. I will checkout usginmodels.

ccaudill commented 9 years ago

@ykhadilkar-rei Can you please give us an update here? Can you tell us what the hold-up is on finishing up this issue?

smrgeoinfo commented 9 years ago

Decision is to update usginmodels.validate_file to insert a dummy row of data as the first row to force the correct data type inferencing when CKAN sends the csv to Postgres to create a table.
see https://github.com/usgin/usginmodels/issues/5 @jessica-azgs @dan-olaru-reisys @FuhuXia

ccaudill commented 9 years ago

http://uat-ngds.reisys.com/geoserver/get-ogc-services?url=http%3A%2F%2Fuat-ngds.reisys.com%2Fgeoserver-srv%2FALWellLog%2Fows%3Fservice%3DWFS%26version%3D1.1.0%26request%3DGetCapabilities%26typeName%3DALWellLog%3AWellLog&workspace=ALWellLog Published services finally validated using XML spy!! I think we're finally done with this issue.