Open robragsdale opened 9 years ago
The ncISO portion of THREDDS can output ISO 19115-2 metadata files. Have you tried to use these inpycsw? If they work, you could use: https://github.com/kwilcox/thredds_crawler to download the ISO files you want to ingest into pycsw. A direct connector in pycsw would also be a good idea...
Kyle
I saw there was the command line java piece, so I am trying that and it works. Let me back up, my definition of works is I see an entry in the database after I created the xml metadata file then pointed pycsw to ingest it. I am not sure if all the fields are populated in a useful manner yet as I am learning what the schema for pycsw is at the same time.
Dan
Also discussed here: https://github.com/geopython/pycsw/issues/155
Dan,
Wasn't sure you were referring to Java ncISO in your original post. NERACOOS runs this nightly against 2 of our TDS catalogs and produces a WAF. http://www.neracoos.org/WAF/
The ISO files are at: http://www.neracoos.org/WAF/UMaine/iso/ http://www.neracoos.org/WAF/BIO/iso/
AFAIK the ncISO crawler produces valid ISO files via the ncISO TDS plugin and the NGDC Geoportal ingests it successfully.
I posted a simple python 2.7 script we use to run this via cron.
https://github.com/neracoos-open/neracoos_catalog/tree/master/src/MetadataWAF
It has a rename option since we are running ncSOS which requires a ncml extension to overcoming a TDS aggregation cache issue, but that is optional.
The TDS catalog url's are: http://www.neracoos.org/thredds/UMO_SOS_historical_realtime_agg.html and http://www.neracoos.org/thredds/catalog/WW3/catalog.html
Hope this helps.
Eric
Hi Dan, PyCSW imports ISO XML files directly using the pycsw-admin.py utility that comes with it. You just point it to your WAF. And maintaining a WAF is as simple as wget'ting all your ncISO end points from TDS (as Eric expounded on). Not a lot of extra glue needed, but here is my PyCSW loading script to give you an idea: https://www.dropbox.com/s/w36vmgttbn64t7s/update_pycsw.py?dl=0. PyCSW drives our data search page here: http://pacioos.org/search/. Other details regarding our metadata and WAFs here: http://pacioos.org/metadata/ Cheers, John Maurer, PacIOOS
John,
Thanks for the info, I was actually playing around on your search page before I started working with pycsw. It's good to see someone using the ncISO in production, that gave me confidence it provides useful metadata to prime the catalog with. In your results, you provide the access methods. Where is that coming from? I ran ncISO against a THREDDS endpoint and expected to seepycsw populate the links column, however nothing was there.
Hi Dan, The access methods should be captured by pycsw. Originally pycsw was missing the xml elements that ncISO uses to populate these, but Rich Signell and I submitted the issue to pycsw and Tom Kralidis added support. Seehttps://github.com/geopython/pycsw/issues/238 for details. Are you sure you have the latest pycsw?, and that your ncISO output contains gmd:distributorTransferOptions and/or srv:SV_ServiceIdentification elements? Cheers, John
John,
Those seem to be populated. Is there a specific column(s) they go into in thepycsw schema, or are you parsing them out of the XML?
Dan
Within the pycsw database, they go in the "links" column as you stated earlier. If you are running CSW commands to query your database, where you parse the XML response depends on which outputSchema you are requesting. In the CSW schema you look at dct:references, in the gmd schema (ISO) you look for gmd:URL under gmd:distributorTransferOptions and srv:SV_ServiceIdentification. (I do the latter so I can also pull gmd:Name.) John
Please be aware that distribution links in the ISO metadata can be available from two different Xpaths in the distribution section of ISO. NcISO always outputs the links at this Xpath: gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor/gmd:distributorTransferOptions/gmd:MD_DigitalTransferOptions/gmd:onLine/gmd:CI_OnlineResource But you may stumble across ISO records that also have links at this Xpath: gmd:distributionInfo/gmd:MD_Distribution/gmd:transferOptions/gmd:MD_DigitalTransferOptions/gmd:onLine/gmd:CI_OnlineResource Anna
Anna, thanks for the info. I think this might turn out to be the real issue. Searching through the XML, while I do see OGC-WMS for instance, it is wrapped in a gmd:identificationInfo hierarchy.
Dan
Hi Dan,
Maybe I’m not entirely following this thread, but figure I’ll chime in as I’ve been working on issues like what you are talking about for quite a while.
This is normal and intended behavior. ‘service’ endpoints that are not meant for humans to navigate to are in srv:serviceIdentification elements. ‘links’ are in the gmd:distributionInfo element.
I think pyCSW is built on OWSLib? We worked with the OWSLib developers a while back to make sure that srv:serviceIdentification, as generated by ncISO was parsed into their iso metadata python object. If its currently not supported by pyCSW, it shouldn’t be too difficult to add.
Dave
Dave,
I think it is in pyCSW now, however I'm wondering if something is missing at the THREDDS endpoint I am testing with. I may have the wrong branch, as I pulled from the master at GitHub. The xml file I am testing with: http://gsaaportal.org/media/metadata/xml/SABGOM_Forecast_Model_Run_Collection_best_ISO.xml does have a gmd:distributionInfo element, but the links column is not being populated from it.
Dan
Glancing at this record, it looks like a normal ncISO record.
DistributionInfo includes a link to the OPeNDAP service .html page and the weather and climate toolkit link to view the dataset with that.
If you have a look here: https://github.com/geopython/OWSLib/blob/master/owslib/iso.py#L464
It appears that the code is looking in: gmd:distributionInfo/gmd:MD_Distribution/gmd:transferOptions/gmd:MD_DigitalTransferOptions/gmd:onLine/gmd:CI_OnlineResource
Not, gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor/gmd:distributorTransferOptions/gmd:MD_DigitalTransferOptions/gmd:onLine/gmd:CI_OnlineResource
As it should be.
Shouldn’t be a hard fix to get in there.
Dave
My issue was all on my end. The python virtual environment I was using on the server was the cuprit. Not exactly sure what the issue was. I built a new 2.7.8 environment and added the requirements and no I can import links.
Thanks for the feedback.
Dan
@danramage and @pacioos Dan Ramage is implementing a PyCSW server and asked if anyone had any python tools that would make it less painful to go from the THREDDS metadata to one of the formats PyCSW supports for importing. There is a thread in the PyCSW issues page about this, but Dan didn't see a conclusion to it.
Dan