ucla-data-science-center / ucla-dataverse-guide

A repository that holds documentation and policy information for UCLA Dataverse.
https://ucla-data-science-center.github.io/ucla-dataverse-guide/
0 stars 0 forks source link

Letting Search Engines Crawl Your Installation #13

Open jt14den opened 5 years ago

jt14den commented 5 years ago

Look at http://guides.dataverse.org/en/latest/installation/config.html

jmjamison commented 5 years ago

Using the Dataverse robots.txt file (http://guides.dataverse.org/en/latest/_downloads/robots.txt)

Checking via the dataverse google group that I'm putting it in the correct place.

jmjamison commented 5 years ago

I am not clear which is the dataverse 'root directory' According to Don Sizemore the robots.txt file can be in both or either of 2 directories:

but docroot will be wiped out each time a new war file is removed and replaced, so for now I have robots.txt in both

And, added to the ssl.conf file (/etc/httpd/conf.d), from the Dataverse documentation: commented note - don't let Glassfish serve its version of robots.txt ProxyPassMatch ^/robots.txt$ !

After that restart httpd, glassfish and enabled the harvesting server.

jmjamison commented 5 years ago

Next: dataverse documentation suggests having a site map and running a cron command to update it nightly. (http://guides.dataverse.org/en/latest/installation/config.html)

Added to nightly cron: curl -X POST http://localhost:8080/api/admin/sitemap