ngds / install-and-run

Use this repository's issue tracker to post comments, bug reports, and help questions on installing and running NGDS CKAN.
4 stars 3 forks source link

pycsw install seems to be missing #22

Closed smrgeoinfo closed 9 years ago

smrgeoinfo commented 9 years ago

http://ckanext-spatial.readthedocs.org/en/latest/csw.htmlhttp://ckanext-spatial.readthedocs.org/en/latest/csw.html

Or is the pycsw install buried somewhere?

the v1 installer from last May (https://github.com/ngds/install-and-run/archive/v1.zip) has shell script procedures for installing pycsw; not sure how this would translate to rpm

ccaudill commented 9 years ago

I got it set up on my local with the sample database from Adrian, seems all is working with no errors. Then I run the command "ckan-pycsw load”, which loads datasets from ckan api into csw database. Then I run “datastore-pycsw load”, which immediately deletes all records from csw database. Can you let me know whether this is the correct behavior, or, let me know the best way to verify that pycsw is set up correctly? Thanks. Fuhu

ccaudill commented 9 years ago

@FuhuXia We need to be sure that harvested data is actually using the USGIN standard, and that we're getting USGIN ISO metadata from a harvest. This was actually in the code here: https://github.com/ngds/ckanext-metadata where Adrian had said that he used Steve's custom federated standard for publish/harvest of metadata, but it needs testing. @smrazgs Can you confirm that this custom standard that you apparently provided Adrian with is the USGIN profile? If so, and the code from https://github.com/ngds/ckanext-metadata is being used, then we're likely good with USGIN ISO metadata being harvested in.

ccaudill commented 9 years ago

Pycsw has been integrated into RPM and can be demo’ed at uat server at http://uat-ngds.reisys.com. I will publish the rpm once you confirm that everything is working as expected. Thanks, Fuhu

ccaudill commented 9 years ago

@FuhuXia I tried to harvest it in at http://demo.geothermaldata.org/harvest/test-rei-pycsw-test/job and it seems to have the same issue as before. It's not harvesting in. The message indicates that the job never starts.

FuhuXia commented 9 years ago

Job not starting indicates there is some problem with the harvester. Without shell access it is hard to debug. The easiest solution is to reboot the server. To harvest from uat server, make sure you use http://uat-ngds.reisys.com/csw as harvest source. Without /csw it is not valid source.

ccaudill commented 9 years ago

I see the same thing when testing at our live site, which we know harvests correctly. Any thoughts? http://www.geothermaldata.org/harvest/test-rei-pycsw-test/job

ccaudill commented 9 years ago

@FuhuXia Can you please tell us where you are on this task? Is there anything we can test for you? I believe I remember you saying you've tested and was ready, and my failed test was a problem with the current CKAN harvester ... shall we consider this issue fixed?

FuhuXia commented 9 years ago

@ccaudill Please install the new version rpm on a fresh CentOS 6.6 box, then test pycsw harvesting using our UAT as source. The instruction should be same as described in the Deployment on Your Own Server part of readme file. I updated it with a few steps to change the external url in files /etc/ckan/production.ini and /var/lib/tomcat6/webapps/geoserver/data/global.xml.

ccaudill commented 9 years ago

Excellent, will do. Thanks @FuhuXia

ccaudill commented 9 years ago

@FuhuXia @smrazgs I'm having some issues in harvesting in the new build CSW, and am still unable to get it to harvest into another instance. We have 2 servers with the new rpm, running CentOS6.6. The 'GINstack' test server has successfully harvested in something: http://159.87.39.4/harvest/admin/michigan-geological-survey-ngds-node But when I try to harvest from that server to another instance, the 'production' server, I cannot get records harvested from the previously listed server: http://159.87.39.5/harvest/azgs-ginstack-test-harvest/job It says it's started, but not finished. It harvests in no records. The CSW GetCapabilites request is responding from the installation I'm trying to harvest: http://159.87.39.4/csw?request=GetCapabilities&Service=CSW&Version=2.0.2

ccaudill commented 9 years ago

I also tried harvesting in from the REI UAT: http://159.87.39.5/harvest/test-rei-pycsw-test It hasn't finished yet, but doesn't seem to be harvesting in either.

FuhuXia commented 9 years ago

http://159.87.39.4/ looks fine to me. Can you modify the file /etc/cron.d/ckan-pycsw and delete everything after && so it reads: 0 1 * * * root /usr/lib/ckan/bin/paster --plugin=ckanext-spatial ckan-pycsw load -p /etc/ckan/pycsw.cfg >> /var/log/ckan-pycsw-loader.log 2>&1 This way we can get the csw into db. Without it the csw db will remain empty because second command delete everything.

For http://159.87.39.5, please post the last 50 lines for /var/log/harvester_run.log, /var/log/gather-consumer.log, /var/log/fetch-consumer.log, so I can see what is with the harvester.

ccaudill commented 9 years ago

@FuhuXia Here are the last 50 lines from the harvester log - should I try to run it again, then grab the log?

2015-02-06 10:15:37,305 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-06 10:15:37,308 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-06 10:15:37,363 INFO [ckanext.harvest.logic.action.update] Harvest job run: {} 2015-02-06 10:15:37,399 INFO [ckanext.harvest.logic.action.update] No new harvest jobs.

Traceback (most recent call last): File "/usr/bin/ckan", line 59, in load_entry_point('PasteScript', 'console_scripts', 'paster')() File "/usr/lib/ckan/lib/python2.6/site-packages/paste/script/command.py", line 104, in run invoke(command, command_name, options, args[1:]) File "/usr/lib/ckan/lib/python2.6/site-packages/paste/script/command.py", line 143, in invoke exit_code = runner.run(args) File "/usr/lib/ckan/lib/python2.6/site-packages/paste/script/command.py", line 238, in run result = self.command() File "/usr/lib/ckan/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 122, in command self.run_harvester() File "/usr/lib/ckan/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 287, in run_harvester jobs = get_action('harvest_jobs_run')(context,{}) File "/usr/lib/ckan/src/ckan/ckan/logic/init.py", line 419, in wrapped result = _action(context, data_dict, **kw) File "/usr/lib/ckan/src/ckanext-harvest/ckanext/harvest/logic/action/update.py", line 338, in harvest_jobs_run raise Exception('There are no new harvesting jobs') Exception: There are no new harvesting jobs 2015-02-06 10:30:17,203 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded 2015-02-06 10:30:17,208 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists 2015-02-06 10:30:20,475 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-06 10:30:20,494 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-06 10:30:20,510 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-06 10:30:20,513 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-06 10:30:20,566 INFO [ckanext.harvest.logic.action.update] Harvest job run: {} 2015-02-06 10:30:20,602 INFO [ckanext.harvest.logic.action.update] No new harvest jobs.

Traceback (most recent call last): File "/usr/bin/ckan", line 59, in load_entry_point('PasteScript', 'console_scripts', 'paster')() File "/usr/lib/ckan/lib/python2.6/site-packages/paste/script/command.py", line 104, in run invoke(command, command_name, options, args[1:]) File "/usr/lib/ckan/lib/python2.6/site-packages/paste/script/command.py", line 143, in invoke exit_code = runner.run(args) File "/usr/lib/ckan/lib/python2.6/site-packages/paste/script/command.py", line 238, in run result = self.command() File "/usr/lib/ckan/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 122, in command self.run_harvester() File "/usr/lib/ckan/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 287, in run_harvester jobs = get_action('harvest_jobs_run')(context,{}) File "/usr/lib/ckan/src/ckan/ckan/logic/init.py", line 419, in wrapped result = _action(context, data_dict, **kw) File "/usr/lib/ckan/src/ckanext-harvest/ckanext/harvest/logic/action/update.py", line 338, in harvest_jobs_run raise Exception('There are no new harvesting jobs') Exception: There are no new harvesting jobs

ccaudill commented 9 years ago

@FuhuXia I can modify the file /etc/cron.d/ckan-pycsw but I don't understand why that needs to be done. Could you explain? Thank you

ccaudill commented 9 years ago

@FuhuXia Here are the last 50 lines from the gather-consumer log:

[root@localhost log]# tail -n 50 gather-consumer.log 2015-02-05 15:34:45,079 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded 2015-02-05 15:34:45,092 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists 2015-02-05 15:34:48,839 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:48,881 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:48,899 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:48,906 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:49,029 DEBUG [ckanext.harvest.queue] Gather queue consumer registered 2015-02-05 20:30:22,947 DEBUG [ckanext.harvest.queue] Received harvest job id: 17fd913a-7b07-46e1-9654-8ee752cd6515 2015-02-05 20:30:22,966 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=17fd913a-7b07-46e1-9654-8ee752cd6515 created=2015-02-06 03:22:55.371402 gather_started=2015-02-06 03:30:22.965920 gather_finished=None finished=None source_id=f6c8e2ba-1f78-4cf2-aad5-8c51668cb068 status=Running> 2015-02-05 20:30:26,080 ERROR [ckanext.harvest.harvesters.base] Error contacting the CSW server: <urlopen error [Errno 113] No route to host> 2015-02-05 20:30:26,088 ERROR [ckanext.harvest.queue] Gather stage failed 2015-02-05 20:45:21,650 DEBUG [ckanext.harvest.queue] Received harvest job id: ec1661ea-a6da-4b28-9400-f30171877c1a 2015-02-05 20:45:21,663 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=ec1661ea-a6da-4b28-9400-f30171877c1a created=2015-02-06 03:41:15.012250 gather_started=2015-02-06 03:45:21.663245 gather_finished=None finished=None source_id=f6c8e2ba-1f78-4cf2-aad5-8c51668cb068 status=Running> 2015-02-05 20:45:24,684 ERROR [ckanext.harvest.harvesters.base] Error contacting the CSW server: <urlopen error [Errno 113] No route to host> 2015-02-05 20:45:24,690 ERROR [ckanext.harvest.queue] Gather stage failed 2015-02-05 21:15:22,766 DEBUG [ckanext.harvest.queue] Received harvest job id: 1bc05784-4c21-4a09-8779-e3e919eebf13 2015-02-05 21:15:22,780 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=1bc05784-4c21-4a09-8779-e3e919eebf13 created=2015-02-06 04:07:33.343631 gather_started=2015-02-06 04:15:22.780019 gather_finished=None finished=None source_id=33a553f8-354d-4b17-b1d6-3d95c006aabf status=Running> 2015-02-05 21:15:33,082 ERROR [ckanext.harvest.harvesters.base] Error contacting the CSW server: 2015-02-05 21:15:33,088 ERROR [ckanext.harvest.queue] Gather stage failed 2015-02-05 21:15:33,089 DEBUG [ckanext.harvest.queue] Received harvest job id: 8cf57ace-2571-4385-aeb5-c9157e6ba4e4 2015-02-05 21:15:33,094 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=8cf57ace-2571-4385-aeb5-c9157e6ba4e4 created=2015-02-06 04:07:01.153399 gather_started=2015-02-06 04:15:33.094168 gather_finished=None finished=None source_id=f6c8e2ba-1f78-4cf2-aad5-8c51668cb068 status=Running> 2015-02-05 21:15:36,113 ERROR [ckanext.harvest.harvesters.base] Error contacting the CSW server: <urlopen error [Errno 113] No route to host> 2015-02-05 21:15:36,116 ERROR [ckanext.harvest.queue] Gather stage failed

ccaudill commented 9 years ago

@FuhuXia and the fetch-consumer:

[root@localhost log]# tail -n 50 fetch-consumer.log 2015-02-05 15:34:30,580 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:30,619 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:30,639 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:30,646 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:30,899 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered 2015-02-05 15:34:31,088 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:31,134 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:31,157 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:31,163 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:31,285 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered 2015-02-05 15:34:32,123 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded 2015-02-05 15:34:32,139 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists 2015-02-05 15:34:36,909 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:36,939 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:36,961 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:36,970 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:37,088 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered 2015-02-05 15:34:39,886 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded 2015-02-05 15:34:39,898 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists 2015-02-05 15:34:39,906 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:39,956 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:39,983 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:39,997 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:40,144 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered 2015-02-05 15:34:41,628 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded 2015-02-05 15:34:41,640 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists 2015-02-05 15:34:42,298 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded 2015-02-05 15:34:42,310 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists 2015-02-05 15:34:44,296 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded 2015-02-05 15:34:44,310 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists 2015-02-05 15:34:45,242 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:45,283 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:45,307 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:45,313 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:45,444 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered 2015-02-05 15:34:46,444 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:46,499 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:46,525 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:46,547 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:46,710 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered 2015-02-05 15:34:46,963 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:47,007 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:47,045 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:47,055 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:47,213 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered 2015-02-05 15:34:48,316 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-05 15:34:48,415 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-05 15:34:48,481 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-05 15:34:48,501 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-05 15:34:48,758 DEBUG [ckanext.harvest.queue] Fetch queue consumer registered

ccaudill commented 9 years ago

@FuhuXia This doesn't seem to be a problem with the harvester, I suppose, because I am able to get a harvest from a source that's not a CKAN GINstack (rpm install). http://159.87.39.5/harvest/test-usgin-catalog-harvest/job I hope that helps...

FuhuXia commented 9 years ago

the reason for the /etc/cron.d/ckan-pycsw change is explained in my previous comment:

I got it set up on my local with the sample database from Adrian, 
seems all is working with no errors. Then I run the command "ckan-pycsw load”, 
which loads datasets from ckan api into csw database. Then I run “datastore-pycsw load”,
which immediately deletes all records from csw database. Can you let me know 
whether this is the correct behavior, or, let me know the best way to verify that
pycsw is set up correctly?

So let us stop the 2nd command "datastore-pycsw load" from running on 159.87.39.4 for now. The 1st command ckan-pycsw load will add all harvested records into csw, then your 159.87.39.5 will be able to harvest from 159.87.39.4.

The reason for 159.87.39.5 not harvesting from our UAT, according to the log message Error contacting the CSW server, is that it cant connect to our uat server. Please run a ping on it to see what kind of networkk issue it is. Please show me the result for this command ping uat-ngds.reisys.com

ccaudill commented 9 years ago

[root@localhost ~]# ping uat-ngds.reisys.com PING uat-ngds.reisys.com (65.242.96.26) 56(84) bytes of data.

ccaudill commented 9 years ago

@FuhuXia Is there a way to get more granularity in the logs? For instance, looking at each http request to the server you're attempting to harvest from?

FuhuXia commented 9 years ago

The log is combined for all harvest jobs. But since you only have three sources, it is easy to put a watch on one harvest source. Here is what you can do:

  1. Comment out the cron job in file /etc/cron.d/ckan-harvest. We will run it manually.
  2. From web gui, make sure all other harvest jobs is not running. Then do a re-harvest (or clear and re-harvest) on the source you want to look at.
  3. Watch the logs realtime by these two commands: tail -f /var/log/fetch-consumer.log & tail -f /var/log/gather-consumer.log &
  4. Manually run harvest job by this command: ckan --plugin=ckanext-harvest harvester run -c /etc/ckan/production.ini

You can watch the progress shoing when harvest job is sent to gather stage, when it is sent to fetch stage, and when dataset being created. If there is error, it will show up in the log, from which you can tell which stage it fails.

ccaudill commented 9 years ago

If we are using a fresh install, why are we having these issues and you not?

FuhuXia commented 9 years ago

I did the same change on my uat server. If not, my csw will also be nearly empty. Records harvested from external source will be removed from csw db, only manually created datasets will stay in csw db.

smrgeoinfo commented 9 years ago

So is the problem that Christy is harvesting records into CKAN, but they're not sticking in the CSW database, only going into the CKAN dataStore? I thought Adrian had fixed that problem so that harvested records would be visible for harvesting out. Did that get lost?

ccaudill commented 9 years ago

@FuhuXia I edited the etc/cron.d/ckan-pycsw on 159.87.39.4 and then tried harvesting back into 159.87.39.5 with same results as previous: http://159.87.39.5/harvest/admin/azgs-ginstack-test-harvest

FuhuXia commented 9 years ago

@smrazgs That is right. The current ckanext-datastorecsw code will delete most csw records, if not all.

@ccaudill The csw cron jon will start at 1am. You can manually run it via this command: /usr/lib/ckan/bin/paster --plugin=ckanext-spatial ckan-pycsw load -p /etc/ckan/pycsw.cfg, after done, try reharvest from 159.87.39.5.

ccaudill commented 9 years ago

@FuhuXia Okay, will do - thanks!

It sounds like we the correct thing is to ensure that the records are available for harvest, correct?

smrgeoinfo commented 9 years ago

@asonnenschein -- didn't you fix the harvester so the harvested-in records are accessible to pyCSW to be harvested out?

ccaudill commented 9 years ago

I was able to successfully harvest from the REI UAT: http://159.87.39.5/harvest/admin/test-rei-pycsw-test but not ours at 159.87.39.4 - I think we've narrowed down what's going on here though.

ccaudill commented 9 years ago

As a side note, we are very anxious to get this and https://github.com/ngds/ckanext-geoserver/issues/7 resolved; here is a note from one of our partners who installed the GINstack: "How is the node-in-a-box bug removal going? We have some upcoming deliveries to make for our play-fairway grant coming up and it would be good to know the status so that we can plan work-arounds"

asonnenschein commented 9 years ago

@smrazgs What you're describing should be the default behavior of the CKAN harvester; there shouldn't have been anything to fix. Harvested in data gets inserted into the PyCSW DB via execution of a command line tool that should be running regularly in a CRON job.

@FuhuXia What exactly got lost? Ckanext-datastorecsw hasn't been altered since June. The correct way to set it up would be just like the stock harvester - run paster datastore-pycsw load -p src/ckan/pycsw.cfg -u http://ckan.instance.org in it's own CRON job.

Here is the only place in the entire codebase that sqlalchemy's delete command is being called.

FuhuXia commented 9 years ago

I am running the two load commands (ckan-pycsw load and datastore-pycsw load) side by side in https://github.com/ngds/install-and-run/blob/master/rpm_install/etc/cron.d/ckan-pycsw#L2

The default behavior of csw load command is to add/update/delete csw datasets and make it consistent with ckan db per its own query. If the queries are different in the two loads commands, we can expect that they are going to delete each other's datasets. And that is what we observed: ckan-pycsw load added harvested datasets into csw, but following datastore-pycsw load will delete them right away.

smrgeoinfo commented 9 years ago

Why are both load commands getting run? Doesn't datastore-pycsw synchronize the pycsw db with the datastore db?

FuhuXia commented 9 years ago

Both commands are in the original upstart script, therefore they both go to cron jobs. If only one is supposed to run, I guess it is datastore-pycsw load. I can do this in UAT server, then out of the 9000+ datasets, only a handful of them will go to csw database. Please confirm this is expected behavior.

ccaudill commented 9 years ago

Notes from 20150210 meeting: everything needs to go into CSW (harvested and published), so he'll modify the second cron job. We're on the same page now. - We'll make those modifications now.

FuhuXia commented 9 years ago

this is fixed in https://github.com/REI-Systems/ckanext-datastorecsw/commit/016ec77163dfc84a024302db09fa545be81746bf. a new rpm is in process to include this change.

ccaudill commented 9 years ago

Excellent! Please let us know when we can update our install. Thanks

FuhuXia commented 9 years ago

rpm version 306 is ready. you can update existing instance with command yum clean metadata && yum update ngds.ckan

ccaudill commented 9 years ago

I've updated and restarted apache on both of our test machines. I get the exact same result when attmepting to harvest from one into another - no records. Still getting the error " Error contacting the CSW server: <urlopen error [Errno 113] No route to host>" @FuhuXia

FuhuXia commented 9 years ago

the issue you are having is network issue. one of your server cant reach the other. do a ping request on the command line to confirm the issue.

ccaudill commented 9 years ago

Thanks @FuhuXia I'm looking into that problem now and then I'll retry the harvests and let you know.

ccaudill commented 9 years ago

@FuhuXia We figured out that due to state ports, I needed to use the internal IP address http://10.208.3.122/csw. Now the error message changed to: No records received from the CSW server

Looks like it failed at the gather consumer:

[root@localhost ~]# tail -n 50 /var/log/gather-consumer.log 2015-02-08 19:24:02,549 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier cf53f876662f70c8350fac5d97fc9ce2 from the CSW 2015-02-08 19:24:02,549 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier df0dbda5ca192cb6d9df00e41729c8b0 from the CSW 2015-02-08 19:24:02,550 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 9e15e1a59b768b330d029e86dc1bd988 from the CSW 2015-02-08 19:24:02,550 INFO [ckanext.spatial.lib.csw_client] Making CSW request: getrecords2 {'outputschema': 'http://www.isotc211.org/2005/gmd', 'cql': None, 'startposition': 9140, 'typenames': 'csw:Record', 'maxrecords': 10, 'esn': 'brief', 'constraints': []} 2015-02-08 19:24:03,004 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 90e5aa8a743c04160c055efb02649806 from the CSW 2015-02-08 19:24:03,005 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier F0455F1651B541A8824B3838472B674D from the CSW 2015-02-08 19:24:03,005 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier b11baabdb451ae7c5c51c9bce906c39d from the CSW 2015-02-08 19:24:03,006 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 199351133AE34D7C82C3FC1D75574128 from the CSW 2015-02-08 19:24:03,006 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 632BB66AA10D4D13961F9C151FEF1FBF from the CSW 2015-02-08 19:24:03,006 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier a748ce233a25e3e0dd00c9865d028d7e from the CSW 2015-02-08 19:24:03,006 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier cf53f876662f70c8350fac5d97ff2624 from the CSW 2015-02-08 19:24:03,007 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier cf53f876662f70c8350fac5d97ca84f6 from the CSW 2015-02-08 19:24:03,007 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 0B4E16F937124B48A26917612C52B4E3 from the CSW 2015-02-08 19:24:03,007 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 3592f7bc37ea27adea06455fbf17f15c from the CSW 2015-02-08 19:24:03,007 INFO [ckanext.spatial.lib.csw_client] Making CSW request: getrecords2 {'outputschema': 'http://www.isotc211.org/2005/gmd', 'cql': None, 'startposition': 9150, 'typenames': 'csw:Record', 'maxrecords': 10, 'esn': 'brief', 'constraints': []} 2015-02-08 19:24:03,460 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 50ec3aefb656b70647f32e38bcce696c from the CSW 2015-02-08 19:24:03,461 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier b6ccceeadfc9e39594724d14322c1150 from the CSW 2015-02-08 19:24:03,461 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 98ddf901b9782a25982e01af3b0d0a47 from the CSW 2015-02-08 19:24:03,461 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 50ec3aefb656b70647f32e38bcef47e1 from the CSW 2015-02-08 19:24:03,461 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 168566464e3d5f8f3cde3b9fc002eb81 from the CSW 2015-02-08 19:24:03,461 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 96C41670E91D4E8DAB444220A03FAAE3 from the CSW 2015-02-08 19:24:03,462 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 50b3a9b3bec98d3d491e2187c5116fa5 from the CSW 2015-02-08 19:24:03,462 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 5b892059f36080b5b0b5196414bcdf8a from the CSW 2015-02-08 19:24:03,462 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 3592f7bc37ea27adea06455fbf48dd08 from the CSW 2015-02-08 19:24:03,463 INFO [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier 5b892059f36080b5b0b5196414a28642 from the CSW 2015-02-08 19:24:36,081 DEBUG [ckanext.harvest.queue] Received from plugin gather_stage: 5740 objects (first: [u'c837b886-b2f4-449e-8b77-b041ffeaa3a7'] last: [u'444f38c5-5f9e-49bb-99cd-e09c687c5ed8']) 2015-02-08 19:24:41,865 DEBUG [ckanext.harvest.queue] Sent 5740 objects to the fetch queue 2015-02-13 09:00:22,380 DEBUG [ckanext.harvest.queue] Received harvest job id: 4a16d192-d3fb-4a3f-bdd4-d801e0e5ebfc 2015-02-13 09:00:22,397 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=4a16d192-d3fb-4a3f-bdd4-d801e0e5ebfc created=2015-02-13 15:54:39.822609 gather_started=2015-02-13 16:00:22.397240 gather_finished=None finished=None source_id=3572f003-9c6b-4794-a88b-8631ea80d93c status=Running> 2015-02-13 09:00:23,106 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] Starting gathering for http://undgeoportal.und.edu:8080/geoportal/csw 2015-02-13 09:00:23,107 INFO [ckanext.spatial.lib.csw_client] Making CSW request: getrecords2 {'outputschema': 'http://www.isotc211.org/2005/gmd', 'cql': None, 'startposition': 0, 'typenames': 'csw:Record', 'maxrecords': 10, 'esn': 'brief', 'constraints': []} 2015-02-13 09:00:23,792 ERROR [ckanext.harvest.harvesters.base] No records received from the CSW server 2015-02-13 09:00:23,798 ERROR [ckanext.harvest.queue] Gather stage failed 2015-02-18 15:21:19,522 INFO [ckanext.facets.helpers] Metadata plugin custom facets loaded 2015-02-18 15:21:19,531 DEBUG [ckanext.ngds.sysadmin.model.db] Sysadmin configuration table already exists 2015-02-18 15:21:29,272 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory 2015-02-18 15:21:29,299 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist 2015-02-18 15:21:29,317 DEBUG [ckanext.harvest.model] Harvest tables defined in memory 2015-02-18 15:21:29,323 DEBUG [ckanext.harvest.model] Harvest tables already exist 2015-02-18 15:21:29,437 DEBUG [ckanext.harvest.queue] Gather queue consumer registered 2015-02-18 15:45:25,642 DEBUG [ckanext.harvest.queue] Received harvest job id: 60d15ea6-ab23-4153-9a44-0fb0150aa2f5 2015-02-18 15:45:25,669 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=60d15ea6-ab23-4153-9a44-0fb0150aa2f5 created=2015-02-18 22:38:33.196010 gather_started=2015-02-18 22:45:25.669373 gather_finished=None finished=None source_id=f6c8e2ba-1f78-4cf2-aad5-8c51668cb068 status=Running> 2015-02-18 15:45:28,711 ERROR [ckanext.harvest.harvesters.base] Error contacting the CSW server: <urlopen error [Errno 113] No route to host> 2015-02-18 15:45:28,719 ERROR [ckanext.harvest.queue] Gather stage failed 2015-02-23 12:30:24,837 DEBUG [ckanext.harvest.queue] Received harvest job id: 4b2da748-38cf-4f18-bd72-d528428e9ac3 2015-02-23 12:30:24,858 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=4b2da748-38cf-4f18-bd72-d528428e9ac3 created=2015-02-23 19:26:05.946538 gather_started=2015-02-23 19:30:24.858392 gather_finished=None finished=None source_id=f6c8e2ba-1f78-4cf2-aad5-8c51668cb068 status=Running> 2015-02-23 12:30:26,149 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] Starting gathering for http://10.208.3.122/csw 2015-02-23 12:30:26,151 INFO [ckanext.spatial.lib.csw_client] Making CSW request: getrecords2 {'outputschema': 'http://www.isotc211.org/2005/gmd', 'cql': None, 'startposition': 0, 'typenames': 'csw:Record', 'maxrecords': 10, 'esn': 'brief', 'constraints': []} 2015-02-23 12:30:27,370 ERROR [ckanext.harvest.harvesters.base] No records received from the CSW server 2015-02-23 12:30:27,376 ERROR [ckanext.harvest.queue] Gather stage failed [root@localhost ~]#

ccaudill commented 9 years ago

We had a new node test the update rpm so that we could test harvesting from them using: yum clean metadata && yum update ngds.ckan Now, when I hit http://mbmggin.mtech.edu/csw or http://mbmggin.mtech.edu/csw?request=GetCapabilities&Service=CSW&Version=2.0.2 I get the error message "Could not load repository (local): (OperationalError) FATAL: database "pycsw" does not exist None None" Thoughts??

FuhuXia commented 9 years ago

the rpm update path wont do any database change. the pycsw db was introduced in last two or three rpm version. So, if you are updating from an older version, you will need to manually add pycsw db.

sudo -u postgres createdb -O ckan_default pycsw -E utf-8

sudo -u postgres psql -d pycsw -f /usr/pgsql-9.1/share/contrib/postgis-1.5/postgis.sql > /dev/null

sudo -u postgres psql -d pycsw -f /usr/pgsql-9.1/share/contrib/postgis-1.5/spatial_ref_sys.sql > /dev/null

sudo -u postgres psql -d pycsw -c 'GRANT SELECT, UPDATE, INSERT, DELETE ON spatial_ref_sys TO ckan_default' > /dev/null

sudo -u postgres psql -d pycsw -c 'GRANT SELECT, UPDATE, INSERT, DELETE ON geometry_columns TO ckan_default' > /dev/null

cd /usr/lib/ckan/src/ckanext-spatial

../../bin/paster --plugin=ckanext-spatial ckan-pycsw setup -p /etc/ckan/pycsw.cfg
ccaudill commented 9 years ago

thanks - this is really important as all of our new nodes installed late last year and are waiting for these updates.

ccaudill commented 9 years ago

From the user: "when I try to run cmd#1 (in the postgis-1.5 directory) now get Created: database creation failed: ERROR: database "pycsw" already exists"

And from http://mbmggin.mtech.edu/csw I now see: "ows:ExceptionTextCould not load repository (local): records/ows:ExceptionText"

ccaudill commented 9 years ago

@FuhuXia Could you give us some direction on what needs to be done with this install? I believe someone was going to updated the ReadMe https://github.com/ngds/install-and-run to include information on what users needed to do who installed in the Nov timeframe of last year? Here's what this node provider said. "We actually installed mid-late November when it first came out I thought. It said updating for .203 to .306"

FuhuXia commented 9 years ago

Readme file updated. Instruction for updating from rpm prior to 300 has been added. https://github.com/ngds/install-and-run#updating-ngds

ccaudill commented 9 years ago

@lukejbuckley https://github.com/ngds/install-and-run#updating-ngds Looks like pretty great instructions (if a lot of manual work). If you have the time, it would be great if you could do this update so we can continue testing. Thanks for your support!

ccaudill commented 9 years ago

@FuhuXia Thank you. Might there be a way around the user having to do anything with the production.ini file? Or is this just going to effect earlier versions?