Closed smrgeoinfo closed 9 years ago
I got it set up on my local with the sample database from Adrian, seems all is working with no errors. Then I run the command "ckan-pycsw load”, which loads datasets from ckan api into csw database. Then I run “datastore-pycsw load”, which immediately deletes all records from csw database. Can you let me know whether this is the correct behavior, or, let me know the best way to verify that pycsw is set up correctly? Thanks. Fuhu
@FuhuXia We need to be sure that harvested data is actually using the USGIN standard, and that we're getting USGIN ISO metadata from a harvest. This was actually in the code here: https://github.com/ngds/ckanext-metadata where Adrian had said that he used Steve's custom federated standard for publish/harvest of metadata, but it needs testing. @smrazgs Can you confirm that this custom standard that you apparently provided Adrian with is the USGIN profile? If so, and the code from https://github.com/ngds/ckanext-metadata is being used, then we're likely good with USGIN ISO metadata being harvested in.
Pycsw has been integrated into RPM and can be demo’ed at uat server at http://uat-ngds.reisys.com. I will publish the rpm once you confirm that everything is working as expected. Thanks, Fuhu
@FuhuXia I tried to harvest it in at http://demo.geothermaldata.org/harvest/test-rei-pycsw-test/job and it seems to have the same issue as before. It's not harvesting in. The message indicates that the job never starts.
Job not starting indicates there is some problem with the harvester. Without shell access it is hard to debug. The easiest solution is to reboot the server. To harvest from uat server, make sure you use http://uat-ngds.reisys.com/csw
as harvest source. Without /csw
it is not valid source.
I see the same thing when testing at our live site, which we know harvests correctly. Any thoughts? http://www.geothermaldata.org/harvest/test-rei-pycsw-test/job
@FuhuXia Can you please tell us where you are on this task? Is there anything we can test for you? I believe I remember you saying you've tested and was ready, and my failed test was a problem with the current CKAN harvester ... shall we consider this issue fixed?
@ccaudill Please install the new version rpm on a fresh CentOS 6.6 box, then test pycsw harvesting using our UAT as source. The instruction should be same as described in the Deployment on Your Own Server part of readme file. I updated it with a few steps to change the external url in files /etc/ckan/production.ini
and /var/lib/tomcat6/webapps/geoserver/data/global.xml
.
Excellent, will do. Thanks @FuhuXia
@FuhuXia @smrazgs I'm having some issues in harvesting in the new build CSW, and am still unable to get it to harvest into another instance. We have 2 servers with the new rpm, running CentOS6.6. The 'GINstack' test server has successfully harvested in something: http://159.87.39.4/harvest/admin/michigan-geological-survey-ngds-node But when I try to harvest from that server to another instance, the 'production' server, I cannot get records harvested from the previously listed server: http://159.87.39.5/harvest/azgs-ginstack-test-harvest/job It says it's started, but not finished. It harvests in no records. The CSW GetCapabilites request is responding from the installation I'm trying to harvest: http://159.87.39.4/csw?request=GetCapabilities&Service=CSW&Version=2.0.2
I also tried harvesting in from the REI UAT: http://159.87.39.5/harvest/test-rei-pycsw-test It hasn't finished yet, but doesn't seem to be harvesting in either.
http://159.87.39.4/ looks fine to me. Can you modify the file /etc/cron.d/ckan-pycsw and delete everything after && so it reads:
0 1 * * * root /usr/lib/ckan/bin/paster --plugin=ckanext-spatial ckan-pycsw load -p /etc/ckan/pycsw.cfg >> /var/log/ckan-pycsw-loader.log 2>&1
This way we can get the csw into db. Without it the csw db will remain empty because second command delete everything.
For http://159.87.39.5, please post the last 50 lines for /var/log/harvester_run.log, /var/log/gather-consumer.log, /var/log/fetch-consumer.log, so I can see what is with the harvester.
@FuhuXia Here are the last 50 lines from the harvester log - should I try to run it again, then grab the log?
2015-02-06 10:15:37,305 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-06 10:15:37,308 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-06 10:15:37,363 INFO [ckanext.harvest.logic.action.update] Harvest job run: {} 2015-02-06 10:15:37,399 INFO [ckanext.harvest.logic.action.update] No new harvest jobs.
Traceback (most recent call last):
File "/usr/bin/ckan", line 59, in
Traceback (most recent call last):
File "/usr/bin/ckan", line 59, in
@FuhuXia I can modify the file /etc/cron.d/ckan-pycsw but I don't understand why that needs to be done. Could you explain? Thank you
@FuhuXia Here are the last 50 lines from the gather-consumer log:
[root@localhost log]# tail -n 50 gather-consumer.log
2015-02-05 15:34:45,079 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded
2015-02-05 15:34:45,092 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists
2015-02-05 15:34:48,839 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory
2015-02-05 15:34:48,881 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist
2015-02-05 15:34:48,899 DEBUG [ckanext.harvest.model] Harvest tables defined in memory
2015-02-05 15:34:48,906 DEBUG [ckanext.harvest.model] Harvest tables already exist
2015-02-05 15:34:49,029 DEBUG [ckanext.harvest.queue] Gather queue consumer registered
2015-02-05 20:30:22,947 DEBUG [ckanext.harvest.queue] Received harvest job id: 17fd913a-7b07-46e1-9654-8ee752cd6515
2015-02-05 20:30:22,966 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=17fd913a-7b07-46e1-9654-8ee752cd6515 created=2015-02-06 03:22:55.371402 gather_started=2015-02-06 03:30:22.965920 gather_finished=None finished=None source_id=f6c8e2ba-1f78-4cf2-aad5-8c51668cb068 status=Running>
2015-02-05 20:30:26,080 ERROR [ckanext.harvest.harvesters.base] Error contacting the CSW server: <urlopen error [Errno 113] No route to host>
2015-02-05 20:30:26,088 ERROR [ckanext.harvest.queue] Gather stage failed
2015-02-05 20:45:21,650 DEBUG [ckanext.harvest.queue] Received harvest job id: ec1661ea-a6da-4b28-9400-f30171877c1a
2015-02-05 20:45:21,663 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=ec1661ea-a6da-4b28-9400-f30171877c1a created=2015-02-06 03:41:15.012250 gather_started=2015-02-06 03:45:21.663245 gather_finished=None finished=None source_id=f6c8e2ba-1f78-4cf2-aad5-8c51668cb068 status=Running>
2015-02-05 20:45:24,684 ERROR [ckanext.harvest.harvesters.base] Error contacting the CSW server: <urlopen error [Errno 113] No route to host>
2015-02-05 20:45:24,690 ERROR [ckanext.harvest.queue] Gather stage failed
2015-02-05 21:15:22,766 DEBUG [ckanext.harvest.queue] Received harvest job id: 1bc05784-4c21-4a09-8779-e3e919eebf13
2015-02-05 21:15:22,780 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=1bc05784-4c21-4a09-8779-e3e919eebf13 created=2015-02-06 04:07:33.343631 gather_started=2015-02-06 04:15:22.780019 gather_finished=None finished=None source_id=33a553f8-354d-4b17-b1d6-3d95c006aabf status=Running>
2015-02-05 21:15:33,082 ERROR [ckanext.harvest.harvesters.base] Error contacting the CSW server:
@FuhuXia and the fetch-consumer:
[root@localhost log]# tail -n 50 fetch-consumer.log 2015-02-05 15:34:30,580 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:30,619 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:30,639 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:30,646 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:30,899 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered 2015-02-05 15:34:31,088 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:31,134 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:31,157 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:31,163 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:31,285 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered 2015-02-05 15:34:32,123 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded 2015-02-05 15:34:32,139 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists 2015-02-05 15:34:36,909 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:36,939 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:36,961 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:36,970 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:37,088 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered 2015-02-05 15:34:39,886 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded 2015-02-05 15:34:39,898 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists 2015-02-05 15:34:39,906 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:39,956 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:39,983 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:39,997 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:40,144 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered 2015-02-05 15:34:41,628 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded 2015-02-05 15:34:41,640 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists 2015-02-05 15:34:42,298 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded 2015-02-05 15:34:42,310 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists 2015-02-05 15:34:44,296 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded 2015-02-05 15:34:44,310 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists 2015-02-05 15:34:45,242 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:45,283 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:45,307 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:45,313 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:45,444 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered 2015-02-05 15:34:46,444 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:46,499 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:46,525 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:46,547 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:46,710 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered 2015-02-05 15:34:46,963 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:47,007 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:47,045 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:47,055 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:47,213 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered 2015-02-05 15:34:48,316 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:48,415 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:48,481 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:48,501 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:48,758 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered
@FuhuXia This doesn't seem to be a problem with the harvester, I suppose, because I am able to get a harvest from a source that's not a CKAN GINstack (rpm install). http://159.87.39.5/harvest/test-usgin-catalog-harvest/job I hope that helps...
the reason for the /etc/cron.d/ckan-pycsw change is explained in my previous comment:
I got it set up on my local with the sample database from Adrian,
seems all is working with no errors. Then I run the command "ckan-pycsw load”,
which loads datasets from ckan api into csw database. Then I run “datastore-pycsw load”,
which immediately deletes all records from csw database. Can you let me know
whether this is the correct behavior, or, let me know the best way to verify that
pycsw is set up correctly?
So let us stop the 2nd command "datastore-pycsw load" from running on 159.87.39.4 for now. The 1st command ckan-pycsw load
will add all harvested records into csw, then your 159.87.39.5 will be able to harvest from 159.87.39.4.
The reason for 159.87.39.5 not harvesting from our UAT, according to the log message Error contacting the CSW server, is that it cant connect to our uat server. Please run a ping on it to see what kind of networkk issue it is. Please show me the result for this command ping uat-ngds.reisys.com
[root@localhost ~]# ping uat-ngds.reisys.com PING uat-ngds.reisys.com (65.242.96.26) 56(84) bytes of data.
@FuhuXia Is there a way to get more granularity in the logs? For instance, looking at each http request to the server you're attempting to harvest from?
The log is combined for all harvest jobs. But since you only have three sources, it is easy to put a watch on one harvest source. Here is what you can do:
You can watch the progress shoing when harvest job is sent to gather stage, when it is sent to fetch stage, and when dataset being created. If there is error, it will show up in the log, from which you can tell which stage it fails.
If we are using a fresh install, why are we having these issues and you not?
I did the same change on my uat server. If not, my csw will also be nearly empty. Records harvested from external source will be removed from csw db, only manually created datasets will stay in csw db.
So is the problem that Christy is harvesting records into CKAN, but they're not sticking in the CSW database, only going into the CKAN dataStore? I thought Adrian had fixed that problem so that harvested records would be visible for harvesting out. Did that get lost?
@FuhuXia I edited the etc/cron.d/ckan-pycsw on 159.87.39.4 and then tried harvesting back into 159.87.39.5 with same results as previous: http://159.87.39.5/harvest/admin/azgs-ginstack-test-harvest
@smrazgs That is right. The current ckanext-datastorecsw code will delete most csw records, if not all.
@ccaudill The csw cron jon will start at 1am. You can manually run it via this command:
/usr/lib/ckan/bin/paster --plugin=ckanext-spatial ckan-pycsw load -p /etc/ckan/pycsw.cfg
, after done, try reharvest from 159.87.39.5.
@FuhuXia Okay, will do - thanks!
It sounds like we the correct thing is to ensure that the records are available for harvest, correct?
@asonnenschein -- didn't you fix the harvester so the harvested-in records are accessible to pyCSW to be harvested out?
I was able to successfully harvest from the REI UAT: http://159.87.39.5/harvest/admin/test-rei-pycsw-test but not ours at 159.87.39.4 - I think we've narrowed down what's going on here though.
As a side note, we are very anxious to get this and https://github.com/ngds/ckanext-geoserver/issues/7 resolved; here is a note from one of our partners who installed the GINstack: "How is the node-in-a-box bug removal going? We have some upcoming deliveries to make for our play-fairway grant coming up and it would be good to know the status so that we can plan work-arounds"
@smrazgs What you're describing should be the default behavior of the CKAN harvester; there shouldn't have been anything to fix. Harvested in data gets inserted into the PyCSW DB via execution of a command line tool that should be running regularly in a CRON job.
@FuhuXia What exactly got lost? Ckanext-datastorecsw hasn't been altered since June. The correct way to set it up would be just like the stock harvester - run paster datastore-pycsw load -p src/ckan/pycsw.cfg -u http://ckan.instance.org
in it's own CRON job.
Here is the only place in the entire codebase that sqlalchemy's delete
command is being called.
I am running the two load commands (ckan-pycsw load
and datastore-pycsw load
) side by side in https://github.com/ngds/install-and-run/blob/master/rpm_install/etc/cron.d/ckan-pycsw#L2
The default behavior of csw load command is to add/update/delete csw datasets and make it consistent with ckan db per its own query. If the queries are different in the two loads commands, we can expect that they are going to delete each other's datasets. And that is what we observed: ckan-pycsw load
added harvested datasets into csw, but following datastore-pycsw load
will delete them right away.
Why are both load commands getting run? Doesn't datastore-pycsw synchronize the pycsw db with the datastore db?
Both commands are in the original upstart script, therefore they both go to cron jobs. If only one is supposed to run, I guess it is datastore-pycsw load
. I can do this in UAT server, then out of the 9000+ datasets, only a handful of them will go to csw database. Please confirm this is expected behavior.
Notes from 20150210 meeting: everything needs to go into CSW (harvested and published), so he'll modify the second cron job. We're on the same page now. - We'll make those modifications now.
this is fixed in https://github.com/REI-Systems/ckanext-datastorecsw/commit/016ec77163dfc84a024302db09fa545be81746bf. a new rpm is in process to include this change.
Excellent! Please let us know when we can update our install. Thanks
rpm version 306 is ready. you can update existing instance with command yum clean metadata && yum update ngds.ckan
I've updated and restarted apache on both of our test machines. I get the exact same result when attmepting to harvest from one into another - no records. Still getting the error " Error contacting the CSW server: <urlopen error [Errno 113] No route to host>" @FuhuXia
the issue you are having is network issue. one of your server cant reach the other. do a ping request on the command line to confirm the issue.
Thanks @FuhuXia I'm looking into that problem now and then I'll retry the harvests and let you know.
@FuhuXia We figured out that due to state ports, I needed to use the internal IP address http://10.208.3.122/csw. Now the error message changed to: No records received from the CSW server
Looks like it failed at the gather consumer:
[root@localhost ~]# tail -n 50 /var/log/gather-consumer.log 2015-02-08 19:24:02,549 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier cf53f876662f70c8350fac5d97fc9ce2 from the CSW 2015-02-08 19:24:02,549 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier df0dbda5ca192cb6d9df00e41729c8b0 from the CSW 2015-02-08 19:24:02,550 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 9e15e1a59b768b330d029e86dc1bd988 from the CSW 2015-02-08 19:24:02,550 INFO [ckanext.spatial.lib.csw_client] Making CSW request: getrecords2 {'outputschema': 'http://www.isotc211.org/2005/gmd', 'cql': None, 'startposition': 9140, 'typenames': 'csw:Record', 'maxrecords': 10, 'esn': 'brief', 'constraints': []} 2015-02-08 19:24:03,004 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 90e5aa8a743c04160c055efb02649806 from the CSW 2015-02-08 19:24:03,005 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier F0455F1651B541A8824B3838472B674D from the CSW 2015-02-08 19:24:03,005 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier b11baabdb451ae7c5c51c9bce906c39d from the CSW 2015-02-08 19:24:03,006 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 199351133AE34D7C82C3FC1D75574128 from the CSW 2015-02-08 19:24:03,006 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 632BB66AA10D4D13961F9C151FEF1FBF from the CSW 2015-02-08 19:24:03,006 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier a748ce233a25e3e0dd00c9865d028d7e from the CSW 2015-02-08 19:24:03,006 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier cf53f876662f70c8350fac5d97ff2624 from the CSW 2015-02-08 19:24:03,007 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier cf53f876662f70c8350fac5d97ca84f6 from the CSW 2015-02-08 19:24:03,007 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 0B4E16F937124B48A26917612C52B4E3 from the CSW 2015-02-08 19:24:03,007 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 3592f7bc37ea27adea06455fbf17f15c from the CSW 2015-02-08 19:24:03,007 INFO [ckanext.spatial.lib.csw_client] Making CSW request: getrecords2 {'outputschema': 'http://www.isotc211.org/2005/gmd', 'cql': None, 'startposition': 9150, 'typenames': 'csw:Record', 'maxrecords': 10, 'esn': 'brief', 'constraints': []} 2015-02-08 19:24:03,460 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 50ec3aefb656b70647f32e38bcce696c from the CSW 2015-02-08 19:24:03,461 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier b6ccceeadfc9e39594724d14322c1150 from the CSW 2015-02-08 19:24:03,461 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 98ddf901b9782a25982e01af3b0d0a47 from the CSW 2015-02-08 19:24:03,461 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 50ec3aefb656b70647f32e38bcef47e1 from the CSW 2015-02-08 19:24:03,461 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 168566464e3d5f8f3cde3b9fc002eb81 from the CSW 2015-02-08 19:24:03,461 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 96C41670E91D4E8DAB444220A03FAAE3 from the CSW 2015-02-08 19:24:03,462 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 50b3a9b3bec98d3d491e2187c5116fa5 from the CSW 2015-02-08 19:24:03,462 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 5b892059f36080b5b0b5196414bcdf8a from the CSW 2015-02-08 19:24:03,462 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 3592f7bc37ea27adea06455fbf48dd08 from the CSW 2015-02-08 19:24:03,463 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 5b892059f36080b5b0b5196414a28642 from the CSW 2015-02-08 19:24:36,081 DEBUG [ckanext.harvest.queue] Received from plugin gather_stage: 5740 objects (first: [u'c837b886-b2f4-449e-8b77-b041ffeaa3a7'] last: [u'444f38c5-5f9e-49bb-99cd-e09c687c5ed8']) 2015-02-08 19:24:41,865 DEBUG [ckanext.harvest.queue] Sent 5740 objects to the fetch queue 2015-02-13 09:00:22,380 DEBUG [ckanext.harvest.queue] Received harvest job id: 4a16d192-d3fb-4a3f-bdd4-d801e0e5ebfc 2015-02-13 09:00:22,397 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=4a16d192-d3fb-4a3f-bdd4-d801e0e5ebfc created=2015-02-13 15:54:39.822609 gather_started=2015-02-13 16:00:22.397240 gather_finished=None finished=None source_id=3572f003-9c6b-4794-a88b-8631ea80d93c status=Running> 2015-02-13 09:00:23,106 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] Starting gathering for http://undgeoportal.und.edu:8080/geoportal/csw 2015-02-13 09:00:23,107 INFO [ckanext.spatial.lib.csw_client] Making CSW request: getrecords2 {'outputschema': 'http://www.isotc211.org/2005/gmd', 'cql': None, 'startposition': 0, 'typenames': 'csw:Record', 'maxrecords': 10, 'esn': 'brief', 'constraints': []} 2015-02-13 09:00:23,792 ERROR [ckanext.harvest.harvesters.base] No records received from the CSW server 2015-02-13 09:00:23,798 ERROR [ckanext.harvest.queue] Gather stage failed 2015-02-18 15:21:19,522 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded 2015-02-18 15:21:19,531 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists 2015-02-18 15:21:29,272 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-18 15:21:29,299 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-18 15:21:29,317 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-18 15:21:29,323 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-18 15:21:29,437 DEBUG [ckanext.harvest.queue] Gather queue consumer registered 2015-02-18 15:45:25,642 DEBUG [ckanext.harvest.queue] Received harvest job id: 60d15ea6-ab23-4153-9a44-0fb0150aa2f5 2015-02-18 15:45:25,669 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=60d15ea6-ab23-4153-9a44-0fb0150aa2f5 created=2015-02-18 22:38:33.196010 gather_started=2015-02-18 22:45:25.669373 gather_finished=None finished=None source_id=f6c8e2ba-1f78-4cf2-aad5-8c51668cb068 status=Running> 2015-02-18 15:45:28,711 ERROR [ckanext.harvest.harvesters.base] Error contacting the CSW server: <urlopen error [Errno 113] No route to host> 2015-02-18 15:45:28,719 ERROR [ckanext.harvest.queue] Gather stage failed 2015-02-23 12:30:24,837 DEBUG [ckanext.harvest.queue] Received harvest job id: 4b2da748-38cf-4f18-bd72-d528428e9ac3 2015-02-23 12:30:24,858 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=4b2da748-38cf-4f18-bd72-d528428e9ac3 created=2015-02-23 19:26:05.946538 gather_started=2015-02-23 19:30:24.858392 gather_finished=None finished=None source_id=f6c8e2ba-1f78-4cf2-aad5-8c51668cb068 status=Running> 2015-02-23 12:30:26,149 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] Starting gathering for http://10.208.3.122/csw 2015-02-23 12:30:26,151 INFO [ckanext.spatial.lib.csw_client] Making CSW request: getrecords2 {'outputschema': 'http://www.isotc211.org/2005/gmd', 'cql': None, 'startposition': 0, 'typenames': 'csw:Record', 'maxrecords': 10, 'esn': 'brief', 'constraints': []} 2015-02-23 12:30:27,370 ERROR [ckanext.harvest.harvesters.base] No records received from the CSW server 2015-02-23 12:30:27,376 ERROR [ckanext.harvest.queue] Gather stage failed [root@localhost ~]#
We had a new node test the update rpm so that we could test harvesting from them using: yum clean metadata && yum update ngds.ckan Now, when I hit http://mbmggin.mtech.edu/csw or http://mbmggin.mtech.edu/csw?request=GetCapabilities&Service=CSW&Version=2.0.2 I get the error message "Could not load repository (local): (OperationalError) FATAL: database "pycsw" does not exist None None" Thoughts??
the rpm update path wont do any database change. the pycsw db was introduced in last two or three rpm version. So, if you are updating from an older version, you will need to manually add pycsw db.
sudo -u postgres createdb -O ckan_default pycsw -E utf-8
sudo -u postgres psql -d pycsw -f /usr/pgsql-9.1/share/contrib/postgis-1.5/postgis.sql > /dev/null
sudo -u postgres psql -d pycsw -f /usr/pgsql-9.1/share/contrib/postgis-1.5/spatial_ref_sys.sql > /dev/null
sudo -u postgres psql -d pycsw -c 'GRANT SELECT, UPDATE, INSERT, DELETE ON spatial_ref_sys TO ckan_default' > /dev/null
sudo -u postgres psql -d pycsw -c 'GRANT SELECT, UPDATE, INSERT, DELETE ON geometry_columns TO ckan_default' > /dev/null
cd /usr/lib/ckan/src/ckanext-spatial
../../bin/paster --plugin=ckanext-spatial ckan-pycsw setup -p /etc/ckan/pycsw.cfg
thanks - this is really important as all of our new nodes installed late last year and are waiting for these updates.
From the user: "when I try to run cmd#1 (in the postgis-1.5 directory) now get Created: database creation failed: ERROR: database "pycsw" already exists"
And from http://mbmggin.mtech.edu/csw I now see: "ows:ExceptionTextCould not load repository (local): records/ows:ExceptionText"
@FuhuXia Could you give us some direction on what needs to be done with this install? I believe someone was going to updated the ReadMe https://github.com/ngds/install-and-run to include information on what users needed to do who installed in the Nov timeframe of last year? Here's what this node provider said. "We actually installed mid-late November when it first came out I thought. It said updating for .203 to .306"
Readme file updated. Instruction for updating from rpm prior to 300 has been added. https://github.com/ngds/install-and-run#updating-ngds
@lukejbuckley https://github.com/ngds/install-and-run#updating-ngds Looks like pretty great instructions (if a lot of manual work). If you have the time, it would be great if you could do this update so we can continue testing. Thanks for your support!
@FuhuXia Thank you. Might there be a way around the user having to do anything with the production.ini file? Or is this just going to effect earlier versions?
http://ckanext-spatial.readthedocs.org/en/latest/csw.htmlhttp://ckanext-spatial.readthedocs.org/en/latest/csw.html
Or is the pycsw install buried somewhere?
the v1 installer from last May (https://github.com/ngds/install-and-run/archive/v1.zip) has shell script procedures for installing pycsw; not sure how this would translate to rpm