Closed smrgeoinfo closed 9 years ago
as far as I know, harvested datasets go through the validation processe if they are tagged with a usgin keyword. @ydave-reisys correct me if i am wrong.
@smrazgs Thats correct we check if dataset is tagged with 'usgincm:' keyword from the provided list of keywords for all content models (https://github.com/usgin-models/exchangecatalog/blob/master/keywordsNamespacesContentModel.csv) and then proceed for the validation.
The ISO XML metadata might have multiple DigitalTransferOptions, for different 'distributions'-- probably at least an Excel spreadsheet and a CSV version of the same file. How does the code identify the correct 'distribution' link/
@smrazgs It looks for csv format.
@ccaudill I looked at all datasets in http://services.stategeothermaldata.org/geoAL/csw which has only 2 datasets with csv files. As per requirement if the dataset confirms to one of Ngds models then only CSV resources are accepted. Both datasets Nevada Power Plant Facilities test and Alabama Well Logs test have xls resource alongwith csv hence both datasets are not harvested.
@smrazgs Perhaps you should comment on this too, but a valid CSV file should be harvested and published, regardless of what other distributions (file types) are in the harvested metadata. @ydave-reisys @dano-reisys
http://repository.stategeothermaldata.org/metadata/record/eaf12e0c53a4222440a8b343a21546f6.iso.xml There is an example of a metadata record that we're trying to harvest in, and publish the CSV.
@dano-reisys The CSW at http://services.stategeothermaldata.org/geoAL/csw now has 12 records, including 2 test resources with unpublished CSV files (the metadata for those include the elements in the metadata as specified at https://github.com/ngds/documents/blob/master/Tier3-csv-DistributionLink_inISO19139.docx). @smrazgs
This CSW is ready for your harvesting testing.
New rpm today. This comment is to document the first test of harvesting in CSV files for automatic publishing. The CSV files evidently did not conform to the schema, and were thus not published. Errors were given, good specific ones, so that's what we'd want to happen. I'll give a screenshot below, correct the files, then try for another test harvest.
It looks as though the automatic publishing did not work. This record did not error, and is valid, but just harvesting in the CSV and did not publish the service: http://test.geothermaldata.org/dataset/nevada-power-plant-facilities-test THis is from the harvest: http://test.geothermaldata.org/harvest/ala @dano-reisys @smrazgs @kvuppala
Is geoserver running? has it been configured?
After all the steps please update two config files. Update "ckan.hostname" & "ngds.aggregator_url" (no trailing slash) variables with correct URLs in file /etc/ckan/production.ini, and update proxyBaseUrl in file /var/lib/tomcat6/webapps/geoserver/data/global.xml, replacing 127.0.0.1 with correct URL. Restart server after update.
I can't seem to get to: http://test.geothermaldata.org/geoserver-srv/web/
Thanks @dano-reisys Yes, this has been done on test.geothermaldata.org/geoserver. Here is the URL in the GeoServer global.xml file:
@ccaudill , I get a 404 error when I try that link...
can we try to deploy the csv manually to see if that works?
Yes, it does: http://test.geothermaldata.org/dataset/test-well-logs-publish I just did this one.
I tried another harvest after checking the metadata records, which were custom-made to make sure they had the elements as Steve outlined were needed for this task; USGIN keyword, url to csv, applicationProfile string (content model namespace), and the name of the file:
<gmd:descriptiveKeywords>
<gmd:MD_Keywords>
<gmd:keyword>
<gco:CharacterString>usgincm:well log observation</gco:CharacterString>
</gmd:keyword>
<gmd:type>
<gmd:MD_KeywordTypeCode codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/gmxCodelists.xml#MD_KeywordTypeCode" codeListValue="theme">theme</gmd:MD_KeywordTypeCode>
</gmd:type>
</gmd:MD_Keywords>
</gmd:descriptiveKeywords>
<gmd:MD_DigitalTransferOptions>
<gmd:onLine>
<gmd:CI_OnlineResource>
<gmd:linkage>
<gmd:URL>http://url to get csv file</gmd:URL>
</gmd:linkage>
<gmd:applicationProfile>
<gco:CharacterString>http://stategeothermaldata.org/uri-gin/aasg/xmlschema/welllog/0.8</gco:CharacterString>
</gmd:applicationProfile>
<gmd:name>
<gco:CharacterString>NGDS Tier 3 Data, csv format: nmwelllog.csv</gco:CharacterString>
</gmd:name>
<gmd:function>
<gmd:CI_OnLineFunctionCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_OnlineFunctionCode" codeListValue="download">download</gmd:CI_OnLineFunctionCode>
</gmd:function>
</gmd:CI_OnlineResource>
</gmd:onLine>
</gmd:MD_DigitalTransferOptions>
These still did not get published. See that the metadata records, after being harvested into CKAN did NOT inherit the gmd:applicationProfile element: http://test.geothermaldata.org/metadata/iso-19139/71beeb4c-d551-4ccd-b2ee-ed41279fd5ef.xml But it definitely is in the metadata record which was harvested in from a Geoportal: ftp://AZGS:sharefiles@secureftp.azgs.az.gov/AZGS/ccaudill/NVPowerPlantFacilities-testMetadata.xml ftp://AZGS:sharefiles@secureftp.azgs.az.gov/AZGS/ccaudill/NMWellLogs-testMetadata.xml @smrazgs @dano-reisys @kvuppala
Thank you @dano-reisys - great work. Looks like the auto publishing is working and finished up:
http://test.geothermaldata.org/dataset/nevada-power-plant-facilities-test http://test.geothermaldata.org/harvest/ala
see #610 for background When a harvested record indicates that a Tier3 CSV data set is available for a resource, get the csv, validate it, and if valid, deploy an NGDS web service. The technical requirement are described in a document that is in the ngds/documents repository: https://github.com/ngds/documents/blob/master/GDR_integrationProjectRequirements.docx
The requirements for identifying the correct distribution don't account for multiple distribution options; and should have identified that a usgin: content model keyword would be present. We need to review what the application is looking for in the metadata record to identify the correct distribution link to get the csv file. @dano-reisys can you get that info?