knowledge-computing / mapkurator-system

https://knowledge-computing.github.io/mapkurator-doc/#/
28 stars 2 forks source link

Docker Install - mapKurator with Recogito - Image processing fails #4

Open ewlarson opened 1 year ago

ewlarson commented 1 year ago

I was able to follow the "Use mapKurator with Recogito" > Installation steps all the way to running the mapKurator option on an uploaded image file in Recogito, but the mapKurator action fails with this message and Recogito stack trace:

Message

"Processing failed: /home/mapkurator-system/data/test_imgs/sample_output/039106c2-41fc-4340-9247-4678d47ea886_annotations.json (No such file or directory)"

Recogito application

[info] - application - Copying from file: /home/recogito2/uploads/user_data/ew/ewlarson/qf0orxqkmknzty/039106c2-41fc-4340-9247-4678d47ea886.jpeg to /home/mapkurator-system/data/test_imgs/sample_input/039106c2-41fc-4340-9247-4678d47ea886.jpeg
[info] - application - Starting mapKurator - image upload
2023-11-14 10:58:46.246303: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:root:No files found for /home/mapkurator-system/data/test_imgs/sample_output/intermediate_results/stitch/intermediate_results/
[info] - application - mapKurator completed
java.io.FileNotFoundException: /home/mapkurator-system/data/test_imgs/sample_output/039106c2-41fc-4340-9247-4678d47ea886_annotations.json (No such file or directory)
    at java.io.FileInputStream.open0(Native Method)
    at java.io.FileInputStream.open(FileInputStream.java:195)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at transform.mapkurator.MapKuratorActor.doWork(MapKuratorActor.scala:32)
    at transform.WorkerActor$$anonfun$receive$1.applyOrElse(WorkerActor.scala:33)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
    at transform.WorkerActor.aroundReceive(WorkerActor.scala:11)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
    at akka.actor.ActorCell.invoke(ActorCell.scala:557)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
    at akka.dispatch.Mailbox.run(Mailbox.scala:225)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

It appears the "stitch" process fails and ultimately no annotations are output.

Any advice for troubleshooting this error?


I'm running the application on a Paperspace ML-in-a-box machine:

OS template => ML-in-a-Box Ubuntu 20.04 Machine => GPU+ $0.45/hr | 8 CPU | 30 GB RAM | Quadro M4000 Region => East Coast (NY2) Disk size => 100GB

These are the libraries installed by default: https://github.com/Paperspace/ml-in-a-box


Screenshot 2023-11-14 at 11 24 55 AM
zekun-li commented 1 year ago

Seems that the input images was not processed successfully by mapKurator.

Recogito interface does not give you too much information, so you can follow the Using mapKurator-Recogito docker image for standalone mapKurator part, then run run_img.py following the instructions here on your input image and see what are the errors.

ewlarson commented 1 year ago

Hi @zekun-li // Thanks so much for your reply!

After moving my test image file into the docker container, I was able to run the sample run_img.py command with some success. As the example script is written, I see directories for crop, spotter, and stitch and the stitch dir has an output geojson file in it.

However, if I attempt to add additional optional modules to the command, I start to see errors.

Works - module_img_geojson

(mapkurator) root@46b26528dc2d:/home/mapkurator-system# python run_img.py --map_kurator_system_dir /home/mapkurator-system/ --input_dir_path /home/mapkurator-test-images/input/ --expt_name mapKurator_test --module_cropping --module_get_dimension --module_text_spotting --text_spotting_model_dir /home/spotter_v2/PALEJUN/ --spotter_model spotter_v2 --spotter_config /home/spotter_v2/PALEJUN/configs/PALEJUN/SynthMap/SynthMap_Polygon.yaml --spotter_expt_name test --module_img_geojson --output_folder /home/mapkurator-test-images/output/ --gpu_id 0

Namespace(expt_name='mapKurator_test', gpu_id=0, input_dir_path='/home/mapkurator-test-images/input/', map_kurator_system_dir='/home/mapkurator-system/', module_cropping=True, module_entity_linking=False, module_gen_geotiff=False, module_geocoord_geojson=False, module_get_dimension=True, module_img_geojson=True, module_post_ocr=False, module_text_spotting=True, output_folder='/home/mapkurator-test-images/output/', print_command=False, spotter_config='/home/spotter_v2/PALEJUN/configs/PALEJUN/SynthMap/SynthMap_Polygon.yaml', spotter_expt_name='test', spotter_model='spotter_v2', text_spotting_model_dir='/home/spotter_v2/PALEJUN/')

run_img.py:73: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  sample_map_df=sample_map_df.append(tmp_path,ignore_index=True)
/home/mapkurator-test-images/output/mapKurator_test/crop/lake_superior
INFO:root:Done text spotting for lake_superior

INFO:root:Time for generating geotiff: 0
INFO:root:Time for Cropping : 1
INFO:root:Time for text spotting : 35
INFO:root:Time for generating geojson in img coordinate : 1
INFO:root:Time for generating geojson in geo coordinate : 0
INFO:root:Time for entity linking : 0
INFO:root:Time for post OCR : 0

Errors - module_post_ocr and others

(mapkurator) root@46b26528dc2d:/home/mapkurator-system# python run_img.py --map_kurator_system_dir /home/mapkurator-system/ --input_dir_path /home/mapkurator-test-images/input/ --expt_name mapKurator_test --module_cropping --module_get_dimension --module_text_spotting --text_spotting_model_dir /home/spotter_v2/PALEJUN/ --spotter_model spotter_v2 --spotter_config /home/spotter_v2/PALEJUN/configs/PALEJUN/SynthMap/SynthMap_Polygon.yaml --spotter_expt_name test --module_img_geojson --module_post_ocr --module_geocoord_geojson --module_entity_linking --module_gen_geotiff --output_folder /home/mapkurator-test-images/output/ --gpu_id 0

Namespace(expt_name='mapKurator_test', gpu_id=0, input_dir_path='/home/mapkurator-test-images/input/', map_kurator_system_dir='/home/mapkurator-system/', module_cropping=True, module_entity_linking=True, module_gen_geotiff=True, module_geocoord_geojson=True, module_get_dimension=True, module_img_geojson=True, module_post_ocr=True, module_text_spotting=True, output_folder='/home/mapkurator-test-images/output/', print_command=False, spotter_config='/home/spotter_v2/PALEJUN/configs/PALEJUN/SynthMap/SynthMap_Polygon.yaml', spotter_expt_name='test', spotter_model='spotter_v2', text_spotting_model_dir='/home/spotter_v2/PALEJUN/')

run_img.py:73: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  sample_map_df=sample_map_df.append(tmp_path,ignore_index=True)
/home/mapkurator-test-images/output/mapKurator_test/crop/lake_superior
INFO:root:Done text spotting for lake_superior
input_geojson_file /home/mapkurator-test-images/output/mapKurator_test/stitch/test/lake_superior.geojson
geojson_postocr_output_file /home/mapkurator-test-images/output/mapKurator_test/postocr/test/lake_superior.geojson
exe_ret {'error': 'Traceback (most recent call last):\t  File "lexical_search.py"; line 1; in <module>\t    from elasticsearch_dsl import Search; Q\tModuleNotFoundError: No module named \'elasticsearch_dsl\'\t'}

INFO:root:Time for generating geotiff: 0
INFO:root:Time for Cropping : 1
INFO:root:Time for text spotting : 35
INFO:root:Time for generating geojson in img coordinate : 1
INFO:root:Time for generating geojson in geo coordinate : 0
INFO:root:Time for entity linking : 0
INFO:root:Time for post OCR : 0

crop: works geojson_test: dir empty geotiff: dir empty postocr: dir empty spotter: works stitch: works

So! I'd say this is a partial success, but the script does not appear to communicate with Elasticsearch to query for the postocr step.

Any additional advice or guidance you can share here would be extremely appreciated. Thanks again!

zekun-li commented 1 year ago

Hi @ewlarson , it's great that text spotting now works for you!

That's right, the docker images only support modules up to the stitching (i.e. module_img_geojson), This link contains instructions for setting up environments for post-ocr and entity linking if you need them.

Recogito uses mapKurator up to the stitching module as well.

By the way, you can visualize the output GeoJSON from the stiching module in a GIS software (e.g. QGIS)

Hope this helps!

ewlarson commented 1 year ago

Okok! I'll try building a full mapKurator environment out sometime soon. Sincerely appreciate your swift replies on this issue today.

krdyke commented 8 months ago

Chiming in that I've more or less replicated the steps taken by Eric, in that the cropping, text spotting, and stitching are happening. However, the output geojson has the same text repeating in every polygon created. (The text in question is "Cr"). Also the polygons don't seem to line up. I've pasted the results from a map I pulled from the Rumsey collection.

image

Any tips for what I may be missing (which I'm sure is plenty)? Thank you!!

zekun-li commented 8 months ago

@krdyke Thanks for reporting this issue!

TL;DR: The latest docker image has fixed this problem, please do a docker pull and use the :latest version

More info: We recently updated the spotting model which gives better performance than before, while the stitching part of the system broke down because of the change. The issue has been noticed and resolved.

krdyke commented 8 months ago

Thanks so much for the quick response. I've pulled the most recent image and am unfortunately still seeing the same results. I tried wiping the image entirely and re-pulling it, but again got the same results. If it helps, the pre-stitched images and JSON have the same resulting issues. I've shared a couple examples below, along with a picture of what that area looks like in QGIS on the full image with the full geojson.

h0_w0.json h0_w0 image

Thanks again for the responsiveness!

zekun-li commented 8 months ago

Could you please do ls /home/ within docker and paste the output? Also, can you provide the command you execute after updating the docker image?

krdyke commented 8 months ago

Output of ls inside Docker image

AdelaiDet       elasticuser        mapkurator-test-images  root        spotter_scripts
detectron_test  mapkurator-system  recogito2               spotter-v2  spotter_testr

Docker run command used to create and start container. The bind mount is used to bring images from the "outside" into the Docker environment.

sudo docker run -it 
     --name map-kurator-container 
     --gpus all 
     -p 9820:9830 
     --mount type=bind, source=/home/paperspace/bindmount, target=/home/root/bindmount 
     knowledgecomputing/mapkurator_recogito_2023

Thanks again!

krdyke commented 8 months ago

here's the run_img.py command I've been using

python run_img.py 
     --map_kurator_system_dir /home/mapkurator-system/ 
     --input_dir_path /home/root/bindmount/input/ 
     --expt_name mapKurator_test 
     --module_cropping 
     --module_get_dimension 
     --module_text_spotting 
     --text_spotting_model_dir /home/spotter-v2/PALEJUN/ 
     --spotter_model spotter-v2 
     --spotter_config  /home/spotter-v2/PALEJUN/configs/PALEJUN/Finetune/Rumsey_Polygon_Finetune.yaml 
     --spotter_expt_name test 
     --module_img_geojson 
     --output_folder /home/root/bindmount/output/ 
     --gpu_id 0
zekun-li commented 8 months ago

Things look correct to me. Could you delete all the content in /home/spotter-v2/PALEJUN/build folder and execute the command again?

krdyke commented 8 months ago

I think that did it! The output is looking much better now. Thanks so much for your help with this. I'm excited to try and integrate MapKurator with my own collection.

image