huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections
http://www.uwazi.io
MIT License
240 stars 80 forks source link

upgrade from 1.5 to master: Documents, entities, translations, pages and privacy settings are gone #2645

Closed vasyugan closed 4 years ago

vasyugan commented 4 years ago

I just did an upgrade from 1.5 to the master branch, but the result is not yet satisfactory.

I chose the following approach: I updated the docker build by @fititnt (see https://github.com/fititnt/uwazi-docker ) I generated a new Docker image based on node:8.11-stretch, updated the install stanzas so that they pull in the client for mongo 4.0 I also updated the docker-compose.yml to pull in mongo 4.0 and elasticsearch 5.6 to make sure the current requirements are met.

The first thing I noticed was that mongodb would complain about a duplicate key and therefore fail: MongoError: E11000 duplicate key error collection: uwazi_development.users index: email_1 dup key: { : null }

Dropping the db and restoring it from a dump via mongorestore fixed that for me.

However, next I found that yarn reindex reproducibly throws an error:

Creating index... uwazi_development
2019-12-01T22:06:32.586Z [uwazi_development] ERROR Failed to index document 5bbdb69a5d49fb064c80caf6: {
 "type": "illegal_argument_exception",
 "reason": "mapper [metadata.population__estimate__min_] of different type, current_type [text], merged_type [double]"
}
Indexing documents and entities... - 10 indexedd
Done, took 3.374 seconds
Done in 6.41s.
root@cab8efc7cf40:/home/node/uwazi# 

The result is that all the documents and entities have vanished from the web interface: I am greeted by a blank page.

The only content that is still available are documents which have not been published.

Also, the custom startpage has vanished, our translations as well and the privacy settings (the whole instance was set to mandatory log in, now it is publicly accessible)

Filters, thesauri etc are still available.

(Fortunately, docker makes it real easy to revert to my previous version, so there is no real data loss for me)

RafaPolit commented 4 years ago

Did you migrate the data prior to attempting the reindex? Your data must be compliant with the new structures.

For that “yarn migrate” should do the trick.

After that, what is the result of the reindex?

Just to explain a bit: if there is an error indexing the database, yes, you will most likely see an empty collection. The data is there in the database, but the library renders the entities found in the Elastic index, so that is what you should expect if the indexing failed.

vasyugan commented 4 years ago

Did you migrate the data prior to attempting the reindex? Your data must be compliant with the new structures.

For that “yarn migrate” should do the trick.

After that, what is the result of the reindex?

Just to explain a bit: if there is an error indexing the database, yes, you will most likely see an empty collection. The data is there in the database, but the library renders the entities found in the Elastic index, so that is what you should expect if the indexing failed.

Thanks, I wasn't aware of the migrate command, yet this didn't change things for me. After that, the yarn reindex command still throws the same error, and the web interface remains as described above.

vasyugan commented 4 years ago

So just for the full record: Mongo works after dropping the db and restoring it from a dump, on next docker-compose start I see the following:

root@73cb7a8901dc:/home/node/uwazi# yarn migrate
yarn run v1.6.0
$ node run.js ./app/api/migrations/migrate.js
Done in 2.91s.
root@73cb7a8901dc:/home/node/uwazi# yarn reindex
yarn run v1.6.0
$ node run.js ./database/reindex_elastic.js
Deleting index... uwazi_development
{ json:
   { error:
      { root_cause: [Array],
        type: 'index_not_found_exception',
        reason: 'no such index',
        'resource.type': 'index_or_alias',
        'resource.id': 'uwazi_development',
        index_uuid: '_na_',
        index: 'uwazi_development' },
     status: 404 },
  status: 404,
  cookie: undefined }
Creating index... uwazi_development
2019-12-02T10:53:12.463Z [uwazi_development] ERROR Failed to index document 5bbe0b5e5d49fb064c80cb39: {
 "type": "illegal_argument_exception",
 "reason": "mapper [metadata.population__estimate__min_] of different type, current_type [double], merged_type [text]"
}
Indexing documents and entities... - 10 indexedd
Done, took 2.948 seconds
Done in 5.50s.

Again, even after these steps, all the content and the translations remain gone.

fititnt commented 4 years ago

I just saw the #2637. This is not even the 1.5 stable, but the master branch.

Note, the https://github.com/fititnt/uwazi-docker last version was hardcoded to 1.5 stable, but this can be changed here https://github.com/fititnt/uwazi-docker/blob/3002337c2600851791ad5974aa1c359ef68a5ee0/Dockerfile#L22

At https://github.com/huridocs/uwazi/issues/2637#issuecomment-558570915 he says master is somewhat stable.

## Download Uwazi v1.4
RUN git clone -b v1.4 --single-branch --depth=1 https://github.com/huridocs/uwazi.git /home/node/uwazi/ \

# To

## Download Uwazi v1.5
RUN git clone -b v1.5 --single-branch --depth=1 https://github.com/huridocs/uwazi.git /home/node/uwazi/ \

But in both cases, fititnt/uwazi-docker did not abstract upgrade commands, only the part at infrastrucutre. So is very likely that some commands from standard Uwazi that are used for the non-dockerized version are also required.


@vasyugan after you got a uwazi 1.5 working on Docker, can you ping me, or even make a pull request to change that line? I did not done for the 1.5 because was not able to test, so for security it still locked on the 1.4.

RafaPolit commented 4 years ago

Well, you have an issue where the same property is holding different value types.

You have entities that have, within the metadata property, a key called “population estimate min “, where some values are numbers and some values are strings.

I don’t see an option but to find that document by ID and fix the value, hoping it is the only one with the problem.

How many entities does your DB have?

RafaPolit commented 4 years ago

We could help you co-write a Mongo script that would attempt to fix this in a single bulk operation.

fititnt commented 4 years ago

Sorry for hijack this issue, but in addition to change the RUN git clone -b v1.4 --single-branch to RUN git clone -b v1.5 --single-branch the Dockerfile, just saw that the MongoDB (was 3.4) and the Elastic Search (was 5.5.3) from the docker-compose.yml ideally should be updated too.

I can make both changes on the fititnt/uwazi-docker (and test at the infrastructure level), but beyond point to the here 'Upgrading Uwazi and data migrations' https://github.com/huridocs/uwazi/#upgrading-uwazi-and-data-migrations maybe the README.md from the 1.5 branch could also at least have reference to some place on MongoDB and Elastic Search on how to do upgrades. Not saying that should have all details, but maybe exist some link that explain about this.

Edit:

Maybe because how the ElasticSearch is used (the Uwazi can recreate from scratch) and the how MongoDB could works on upgrades this level of additional details is not necessary. Just wrote here if there is something that is very likely to break. If not, the readme can just stay as it is.

vasyugan commented 4 years ago

Am 02.12.19 um 16:16 schrieb Emerson Rocha:

Sorry for hijack this issue, but in addition to change the |RUN git clone -b v1.4 --single-branch| to |RUN git clone -b v1.5 --single-branch| the Dockerfile, just saw that the MongoDB (was 3.4) and the Elastic Search (was 5.5.3) from the docker-compose.yml ideally should be updated too.

I did all those changes before trying in my local copy before trying and getting the bad results I reported.

uwazi is at Master/1.6,

Mongodb is at 4.0

Elasticsearch is at 5.6

yarn migrate and yarn reindex have been run.

And still no luck.

I can make both changes on the fititnt/uwazi-docker (and test at the infrastructure level), but beyond point to the here 'Upgrading Uwazi and data migrations' https://github.com/huridocs/uwazi/#upgrading-uwazi-and-data-migrations maybe the README.md from the 1.5 branch could also at least have reference to some place on MongoDB and Elastic Search on how to do upgrades. Not saying that should have all details, but maybe exist some link that explain about this.

I love docker because it minimizes the need to mess around with the host system but it adds another layer of complexity and there is no upgrade mechanism in fititnt's version. I am not sure how one can fix this. Of course, the image would have to be rebuilt at each Uwazi upgrade. Upgrading Uwazi within the present docker container doesn't make sense. And ideally, the builds should be pushed to docker hub so that users could just run them without having to build them first.

vasyugan commented 4 years ago

@vasyugan after you got a uwazi 1.5 working on Docker, can you ping me, or even make a pull request to change that line? I did not done for the 1.5 because was not able to test, so for security it still locked on the 1.4.

uwazi 1.5 works great on Docker. Just Uwazi 1.6/Master does not. BTW, in my version, I have removed the elasticsearch and mongodb GUIs, because I did not get them to work over SSH. What is the setting in which you are using them?

fititnt commented 4 years ago

@vasyugan as hotfix, since you having trouble with master, you could just stay on the 1.5, but SSH on the container with docker exec -it <container name> /bin/bash and them edit the files like this patch https://github.com/huridocs/uwazi/pull/2574/files#diff-f2ac9a51c19e7c0f46b1b3937af9c3a0

The uwazi-docker did not use Alpine linux (that was much more space efficient), so is not as hard as could be to do this on a docker.

With this hotfix, you could just stay a few months or until the 1.6 come out. Consider this strategy if using master is harder.

fititnt commented 4 years ago

@vasyugan about upload the compiled fititnt/uwazi-docker to docker hub, I guess we could do this time to make easier. In the next two weeks I could stop entirely one or two days just to upload not only the lastest uwazi, but even some recent old ones.

But since I'm not using Uwazi on production, (and even if I was, would still be a good idea have second people to test) would be very interesting we have at least more people to test. One of the reasons to have more than one Uwazi docker upload to the dockerhub is because of testing upgrades. That's why we should find and "document at least happy paths"

About the happy paths

Let's keep in mind that one thing is the Docker container of Uwazi (the one I could make prebuilds to docker hub), another is the conteiner of MongoDB and Elastic Search. Even if the Uwazi try it's best to help with upgrades, both MongoDB and ElasticSearch can have their own upgrade paths.

I know that Docker make much easier, and you maybe have no ideal how hellish can be do without docker, but my idea of at least already leave prebuilds of fititnt/docker-uwazi and leave some experimental docker-compose-yml that I supose would work for every version, this could leave room for at least 2 people different than me test upgrade paths.

My fear with these upgrades paths is more related if Elastic Search and in special MongoDB if turned on with data from older versions would "do it's thing" to upgrade to the newer version to not upset the newer versions of Uwazi Docker. And by upset, I mean broke and get messages that could seems as erros of Uwazi, but in fact could be just some errors of half-upgraded MongoDB/Elastic search


I will ping @txau @daneryl @vorburger (just to make more people aware, not that need immediate response, in special because I would have to leave things ready to test).

vasyugan commented 4 years ago

@vasyugan as hotfix, since you having trouble with master, you could just stay on the 1.5, but SSH on the container with docker exec -it <container name> /bin/bash and them edit the files like this patch https://github.com/huridocs/uwazi/pull/2574/files#diff-f2ac9a51c19e7c0f46b1b3937af9c3a0

The uwazi-docker did not use Alpine linux (that was much more space efficient), so is not as hard as could be to do this on a docker.

With this hotfix, you could just stay a few months or until the 1.6 come out. Consider this strategy if using master is harder.

Thanks, but I guess it is better copy the changed file during the build process, because the container is going to be destroyed as soon as you run docker-compose down. To make the change persistent, it has to be applied to the image, not the container.

fititnt commented 4 years ago

@vasyugan ops, you are right. I even done that on past. Here one example on how to do it https://github.com/fititnt/uwazi-docker/commit/0c4d3fa216ae5f9beca2c9a435cd71829d3b4fca

ADD --chown=node:node ./scripts/patch/uwazi/database/reindex_elastic.js /home/node/uwazi/uwazi/database/reindex_elastic.js

Just change the paths and add the new file on the local folder where you are building the Uwazi docker.

vasyugan commented 4 years ago

@vasyugan as hotfix, since you having trouble with master, you could just stay on the 1.5, but SSH on the container with docker exec -it <container name> /bin/bash and them edit the files like this patch https://github.com/huridocs/uwazi/pull/2574/files#diff-f2ac9a51c19e7c0f46b1b3937af9c3a0

The uwazi-docker did not use Alpine linux (that was much more space efficient), so is not as hard as could be to do this on a docker.

With this hotfix, you could just stay a few months or until the 1.6 come out. Consider this strategy if using master is harder.

Now using the new file. Still there is no map. Crap! I guess there must be another incompatibility of the old Uwazi with the new map provider.

vasyugan commented 4 years ago

So I set up a uwazi install from master in a VM with fresh Debian stretch. After installing it, I imported the mongo db and copied the documents folder and ran "yarn migrate && yarn reindex && yarn run-production " Unfortunately, the result is exactly the same as in the docker install: Entities, documents, translations are gone without a traces. Some unpublished documents seem to be still there, but when I click on "view" I get a black tab.

When I run yarn blank-state, I get a normally functioning Uwazi install, but of course, all my contents are missing.

Unfortunately, the debug.log and errors.log files are completely empty.

RafaPolit commented 4 years ago

I have written everywhere that you have an inconsistency on your data. Some of your MongoDB Documents hold text and others hold numbers for this populationestimatemin_.

You are trying to fix this in the wrong place. This has nothing to do with Uwazi Docker, this has nothing to do with the version of Mongo. If you have inconsistent type of data, Elastic Search will not index your documents. If Elastic will not index your documents, you will not see them in the library.

I offer, once more, to help create a script that you can run in the Mongo shell to help with this issue.

vasyugan commented 4 years ago

I have written everywhere that you have an inconsistency on your data. Some of your MongoDB Documents hold text and others hold numbers for this populationestimatemin_.

You are trying to fix this in the wrong place. This has nothing to do with Uwazi Docker, this has nothing to do with the version of Mongo. If you have inconsistent type of data, Elastic Search will not index your documents. If Elastic will not index your documents, you will not see them in the library.

I offer, once more, to help create a script that you can run in the Mongo shell to help with this issue.

Yes, thanks for that. I figured out how to delete the faulty entry, which got rid of the error message. However, the end result is the same.:

node run.js ./app/api/migrations/migrate.js
sanitize_empty_geolocations...
processed -> 10
fullText_to_per_page...
pdf_thumbnails...
processed -> 7
geolocation_fields...
processed -> 10
Sanitizing connections...
Deleting incomplete connections...

Deleting orphaned hubs...

relationships_remove_languages...

page-languages...

default-template...
Added default template
Sync - Creating update logs: 
Emptying current logs... 
Processing 1 settings 
Processing 9 templates 
Processing 10 entities 
Processing 0 connections 
Processing 4 dictionaries 
Processing 1 translations 
Processing 4 relationtypes 
Inserting 29 update logs...
Done! 
missing_full_text...

add-RTL-to-settings-languages...
geolocation-arrays...
processed -> 9
separate-custom-uploads-from-documents...

remove_orphan_relations...
deleted orphan entities -> 0
Done in 11.34s.
root@ab536af1a546:/home/node/uwazi# 

root@ab536af1a546:/home/node/uwazi# yarn reindex
yarn run v1.6.0
$ node run.js ./database/reindex_elastic.js
Deleting index... uwazi_development
Creating index... uwazi_development
Indexing documents and entities... - 10 indexedd
Done, took 19.521 seconds
Done in 21.13s.
root@ab536af1a546:/home/node/uwazi# 

Again, yarn reindex only indexes 10 documents after migrate and reindex, where there should be over 200. All content is lost except for a few unpublished entities, but else, everything is gone.

I should say that I use a custom documents directory. So i though maybe this is related. Therefore I copied the standard location and tried yarn reindex again. But again to no avail. And this wouldn't have explained the loss of the entities and translations.

vasyugan commented 4 years ago

Several months later, tried the same, now with all dependencies bumped up and again, the result is unchanged. Except for ten entries, the database is completely empy after migration, all uploaded documents are being ignored. Just about everything is gone, and yet there is no actual error message anywhere.

Here is the output of a yarn reindex run:

yarn run v1.19.1
$ node run.js ./database/reindex_elastic.js
Deleting index... uwazi_development
Creating index... uwazi_development
Indexing documents and entities... - 10 indexedd
Done, took 5.898 seconds
Done in 8.57s.
vasyugan commented 4 years ago

@RafaPolit It seems all errors are eradicated (I found the culprit and deleted it). And yet the result is the same. Today I made another attempt at migrating to master, that is, installed Node.js, elasticsearch and mongodb in the required version via docker, when I had it running I imported the documents and mongo db from my production environment and yet, after running "yarn migrate" and " yarn reindex", the result is unchanged. No errors thrown, but still a mere 10 items are indexed (some of them entities, but there were also 2 documents between them). Do you have any advice as to how I can go about finding out what is going on? Is there any command line parameter for yarn reindex that makes it more verbose? I am pretty lost!

vasyugan commented 4 years ago

Turn out I didn't observe the requirements (didn't upgrade mongodb, elasticsearch, node.js) from the versions needed for 1.5 to those needed for master. I have done this now. Migration still fails but differently, see #2780 therefore I am closing this now.