mattporritt / moodle-search_elastic

An Elasticsearch engine plugin for Moodle's Global Search
https://moodle.org/plugins/search_elastic
GNU General Public License v3.0
16 stars 13 forks source link

Cannot connect to standalone tika instance #28

Closed abias closed 7 years ago

abias commented 7 years ago

Hi Matt,

I am currently looking into search_elastic and have added the latest version of this plugin to a Moodle 3.2.3+ (Build: 20170622) instance and have hooked this up to a fresh elasticsearch 5.5 instance.

I also started a standalone tika instance running on a separate machine and configured this tika instance in the plugin's settings.

While doing the first indexing with sudo -u apache /opt/rh/rh-php70/root/usr/bin/php /var/www/html/moodle_dev3/search/cli/indexer.php --force with fileindexing enabled, the indexer script encountered a fatal error and stopped with this message:

PHP Notice:  Undefined variable: client in /var/www/html/moodle_dev3/search/engine/elastic/classes/document.php on line 157

Notice: Undefined variable: client in /var/www/html/moodle_dev3/search/engine/elastic/classes/document.php on line 157
Default exception handler: Fehler: Call to a member function post() on null Debug: 
Error code: generalexceptionmessage
* line 157 of /search/engine/elastic/classes/document.php: Error thrown
* line 286 of /search/engine/elastic/classes/document.php: call to search_elastic\document->extract_text()
* line 352 of /search/engine/elastic/classes/engine.php: call to search_elastic\document->export_file_for_engine()
* line 510 of /search/engine/elastic/classes/engine.php: call to search_elastic\engine->process_document_files()
* line 588 of /search/classes/manager.php: call to search_elastic\engine->add_document()
* line 75 of /search/cli/indexer.php: call to core_search\manager->index()

!!! Fehler: Call to a member function post() on null !!!
!! 
Error code: generalexceptionmessage !!
!! Stack trace: * line 157 of /search/engine/elastic/classes/document.php: Error thrown
* line 286 of /search/engine/elastic/classes/document.php: call to search_elastic\document->extract_text()
* line 352 of /search/engine/elastic/classes/engine.php: call to search_elastic\document->export_file_for_engine()
* line 510 of /search/engine/elastic/classes/engine.php: call to search_elastic\engine->process_document_files()
* line 588 of /search/classes/manager.php: call to search_elastic\engine->add_document()
* line 75 of /search/cli/indexer.php: call to core_search\manager->index()
 !!

I traced the problem back to commit 4c32c7103f9b1579807e02a743cfedf292247df2 which breaks the connection to tika. Based on the latest code, this patch should solve the problem and clean up the function at the same time:

diff --git a/classes/document.php b/classes/document.php
index 6a574df..50ab93b 100644
--- a/classes/document.php
+++ b/classes/document.php
@@ -148,19 +148,18 @@ class document extends \core_search\document {
      */
     private function extract_text($file) {
         // TODO: add timeout and retries for tika.
-        $config = get_config('search_elastic');
         $extractedtext = '';
         $port = $this->tikaport;
-        $hostname = rtrim($this->tikahostname, "/");
+        $hostname = $this->tikahostname;
         $url = $hostname . ':'. $port . '/tika/form';

+        $client = new \curl();
         $response = $client->post($url, array('file' => $file));
         if ($client->info['http_code'] === 200) {
             $extractedtext = $response;
         }

         return $extractedtext;
-
     }

     /**

However, I am wondering how this problem could remain undetected as you are running this plugin in production...

Thanks, Alex

mattporritt commented 7 years ago

Hi Alex, This is indeed an issue. I've created branch issue28_tika_connect to solve this. It is still a WIP, but I hope to have it complete in the next 24 hours.

The fix was straight forward, but I'm still working on better test coverage to avoid issues like this in the future

I'll double check but we're a little behind with the release cycle with his plugin, which is likely why this was missed. Also with my own testing and dev I've been using pure txt files that won't go via Tika.

abias commented 7 years ago

Thanks, Matt, for your feedback. I am looking forward for your final fix :)

mattporritt commented 7 years ago

Master branch now contains the fix and Moodle plugin dir has been updated.

abias commented 7 years ago

Thanks for the quick fix, Matt