subugoe / goefis

This repository contains the layers and changes specific to the SUB Goettingen
2 stars 0 forks source link

Check why deletion of data doesn't work in the Docker image #76

Closed christianmahnke closed 7 years ago

christianmahnke commented 7 years ago

Peter found an interesting issue, data can't be deleted from inside the Docker image. This seem to apply for ElasticSearch.

christianmahnke commented 7 years ago

I’ve tried to get the department thing running and got stuck. Importing via  bin/librecat department add  devel/department.yml worked but.   Then after running  bin/librecat delete search --bag department it turned out that this was appended to the department listing (bin/librecat department list):

Use of uninitialized value $layer in printf at /srv/LibreCat/lib/LibreCat/Cmd/department.pm line 104.
Use of uninitialized value $id in printf at /srv/LibreCat/lib/LibreCat/Cmd/department.pm line 104.
Use of uninitialized value $name in printf at /srv/LibreCat/lib/LibreCat/Cmd/department.pm line 104.
Use of uninitialized value $display in printf at /srv/LibreCat/lib/LibreCat/Cmd/department.pm line 104.

It turned out that a search for all departments on the elastic search side (GET http://localhost:9200/librecat/_search?q=*&type=department&size=1000) now includes this:

 

   {
        "_index": "librecat",
        "_type": "department",
        "_id": "_delete_by_query",
        "_score": 1.0,
        "_source": {
          "query": {
            "match_all": {
            }
          }
        }
      },

And this will break the generation of the department tree, since it s not guarded by any means for malformed data. Using something like curl -XDELETE http://localhost:9200/librecat/department can clean out the departments.

So basically the problem is that the delete command isn't recognized by the dockerized instance of Elastic Search.

After the activation (you can actually pass things catmandu-local.yml) of the logging and tracing of Search:Elasticsearch it claims to do the following during the deletion of a bag:

[Fri Mar  3 16:56:22 2017] # Request to: http://searchnode:9200
curl -XHEAD 'http://localhost:9200/librecat?pretty=1'

[Fri Mar  3 16:56:22 2017] # Response: 200, Took: 8 ms
# 1
[Fri Mar  3 16:56:22 2017] # Request to: http://searchnode:9200
curl -XPOST 'http://localhost:9200/librecat/department/_delete_by_query?pretty=1' -d '
{
   "query" : {
      "match_all" : {}
   }
}
'

Notice it dumps the host wrong in the curl thing and no DELETE, since it is lying in a log to us, it might be possible that it even does nasty host related things. If one does as this thing says, then it is actually creating a document.

curl -XPOST 'http://localhost:9200/librecat/department/_delete_by_query?pretty=1' -d '
> {
>    "query" : {
>       "match_all" : {}
>    }
> }
> '
{
  "_index" : "librecat",
  "_type" : "department",
  "_id" : "_delete_by_query",
  "_version" : 1,
  "created" : true
}

It actually creates a new document, the problem is apparently the used Perl module. After searching around inside of the Perl modules that are part of the Docker image I've found that the DELETEmethod isn't used by the Perl module, this lead me to the point to figure out which version is actually used by carton during the dependency resolution. The following was the result (parts of carton tree):

Catmandu::Store::ElasticSearch (Catmandu-Store-Elasticsearch-0.0506)
 CQL::Parser (CQL-Parser-1.13)
  Class::Accessor (Class-Accessor-0.34)
  String::Tokenizer (String-Tokenizer-0.06)
 Search::Elasticsearch (Search-Elasticsearch-5.01)
  JSON::MaybeXS (JSON-MaybeXS-1.003009)
christianmahnke commented 7 years ago

We don't rely on the Perl based tool chain for the solution, since it doesn't seem to handle corner cases like indirect dependencies, just use brute force within Dockerfile_Base:

rm -rf local/lib/perl5/Search/ && \
    cpanm -l ./local Search::Elasticsearch@1.19