Open ewlarson opened 1 year ago
Seems that the input images was not processed successfully by mapKurator.
Recogito interface does not give you too much information, so you can follow the Using mapKurator-Recogito docker image for standalone mapKurator part, then run run_img.py
following the instructions here on your input image and see what are the errors.
Hi @zekun-li // Thanks so much for your reply!
After moving my test image file into the docker container, I was able to run the sample run_img.py
command with some success. As the example script is written, I see directories for crop, spotter, and stitch and the stitch dir has an output geojson file in it.
However, if I attempt to add additional optional modules to the command, I start to see errors.
(mapkurator) root@46b26528dc2d:/home/mapkurator-system# python run_img.py --map_kurator_system_dir /home/mapkurator-system/ --input_dir_path /home/mapkurator-test-images/input/ --expt_name mapKurator_test --module_cropping --module_get_dimension --module_text_spotting --text_spotting_model_dir /home/spotter_v2/PALEJUN/ --spotter_model spotter_v2 --spotter_config /home/spotter_v2/PALEJUN/configs/PALEJUN/SynthMap/SynthMap_Polygon.yaml --spotter_expt_name test --module_img_geojson --output_folder /home/mapkurator-test-images/output/ --gpu_id 0
Namespace(expt_name='mapKurator_test', gpu_id=0, input_dir_path='/home/mapkurator-test-images/input/', map_kurator_system_dir='/home/mapkurator-system/', module_cropping=True, module_entity_linking=False, module_gen_geotiff=False, module_geocoord_geojson=False, module_get_dimension=True, module_img_geojson=True, module_post_ocr=False, module_text_spotting=True, output_folder='/home/mapkurator-test-images/output/', print_command=False, spotter_config='/home/spotter_v2/PALEJUN/configs/PALEJUN/SynthMap/SynthMap_Polygon.yaml', spotter_expt_name='test', spotter_model='spotter_v2', text_spotting_model_dir='/home/spotter_v2/PALEJUN/')
run_img.py:73: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
sample_map_df=sample_map_df.append(tmp_path,ignore_index=True)
/home/mapkurator-test-images/output/mapKurator_test/crop/lake_superior
INFO:root:Done text spotting for lake_superior
INFO:root:Time for generating geotiff: 0
INFO:root:Time for Cropping : 1
INFO:root:Time for text spotting : 35
INFO:root:Time for generating geojson in img coordinate : 1
INFO:root:Time for generating geojson in geo coordinate : 0
INFO:root:Time for entity linking : 0
INFO:root:Time for post OCR : 0
(mapkurator) root@46b26528dc2d:/home/mapkurator-system# python run_img.py --map_kurator_system_dir /home/mapkurator-system/ --input_dir_path /home/mapkurator-test-images/input/ --expt_name mapKurator_test --module_cropping --module_get_dimension --module_text_spotting --text_spotting_model_dir /home/spotter_v2/PALEJUN/ --spotter_model spotter_v2 --spotter_config /home/spotter_v2/PALEJUN/configs/PALEJUN/SynthMap/SynthMap_Polygon.yaml --spotter_expt_name test --module_img_geojson --module_post_ocr --module_geocoord_geojson --module_entity_linking --module_gen_geotiff --output_folder /home/mapkurator-test-images/output/ --gpu_id 0
Namespace(expt_name='mapKurator_test', gpu_id=0, input_dir_path='/home/mapkurator-test-images/input/', map_kurator_system_dir='/home/mapkurator-system/', module_cropping=True, module_entity_linking=True, module_gen_geotiff=True, module_geocoord_geojson=True, module_get_dimension=True, module_img_geojson=True, module_post_ocr=True, module_text_spotting=True, output_folder='/home/mapkurator-test-images/output/', print_command=False, spotter_config='/home/spotter_v2/PALEJUN/configs/PALEJUN/SynthMap/SynthMap_Polygon.yaml', spotter_expt_name='test', spotter_model='spotter_v2', text_spotting_model_dir='/home/spotter_v2/PALEJUN/')
run_img.py:73: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
sample_map_df=sample_map_df.append(tmp_path,ignore_index=True)
/home/mapkurator-test-images/output/mapKurator_test/crop/lake_superior
INFO:root:Done text spotting for lake_superior
input_geojson_file /home/mapkurator-test-images/output/mapKurator_test/stitch/test/lake_superior.geojson
geojson_postocr_output_file /home/mapkurator-test-images/output/mapKurator_test/postocr/test/lake_superior.geojson
exe_ret {'error': 'Traceback (most recent call last):\t File "lexical_search.py"; line 1; in <module>\t from elasticsearch_dsl import Search; Q\tModuleNotFoundError: No module named \'elasticsearch_dsl\'\t'}
INFO:root:Time for generating geotiff: 0
INFO:root:Time for Cropping : 1
INFO:root:Time for text spotting : 35
INFO:root:Time for generating geojson in img coordinate : 1
INFO:root:Time for generating geojson in geo coordinate : 0
INFO:root:Time for entity linking : 0
INFO:root:Time for post OCR : 0
crop: works geojson_test: dir empty geotiff: dir empty postocr: dir empty spotter: works stitch: works
So! I'd say this is a partial success, but the script does not appear to communicate with Elasticsearch to query for the postocr step.
Any additional advice or guidance you can share here would be extremely appreciated. Thanks again!
Hi @ewlarson , it's great that text spotting now works for you!
That's right, the docker images only support modules up to the stitching (i.e. module_img_geojson), This link contains instructions for setting up environments for post-ocr and entity linking if you need them.
Recogito uses mapKurator up to the stitching module as well.
By the way, you can visualize the output GeoJSON from the stiching module in a GIS software (e.g. QGIS)
Hope this helps!
Okok! I'll try building a full mapKurator environment out sometime soon. Sincerely appreciate your swift replies on this issue today.
Chiming in that I've more or less replicated the steps taken by Eric, in that the cropping, text spotting, and stitching are happening. However, the output geojson has the same text repeating in every polygon created. (The text in question is "Cr"). Also the polygons don't seem to line up. I've pasted the results from a map I pulled from the Rumsey collection.
Any tips for what I may be missing (which I'm sure is plenty)? Thank you!!
@krdyke Thanks for reporting this issue!
TL;DR: The latest docker image has fixed this problem, please do a docker pull and use the :latest version
More info: We recently updated the spotting model which gives better performance than before, while the stitching part of the system broke down because of the change. The issue has been noticed and resolved.
Thanks so much for the quick response. I've pulled the most recent image and am unfortunately still seeing the same results. I tried wiping the image entirely and re-pulling it, but again got the same results. If it helps, the pre-stitched images and JSON have the same resulting issues. I've shared a couple examples below, along with a picture of what that area looks like in QGIS on the full image with the full geojson.
Thanks again for the responsiveness!
Could you please do ls /home/
within docker and paste the output? Also, can you provide the command you execute after updating the docker image?
Output of ls inside Docker image
AdelaiDet elasticuser mapkurator-test-images root spotter_scripts
detectron_test mapkurator-system recogito2 spotter-v2 spotter_testr
Docker run command used to create and start container. The bind mount is used to bring images from the "outside" into the Docker environment.
sudo docker run -it
--name map-kurator-container
--gpus all
-p 9820:9830
--mount type=bind, source=/home/paperspace/bindmount, target=/home/root/bindmount
knowledgecomputing/mapkurator_recogito_2023
Thanks again!
here's the run_img.py command I've been using
python run_img.py
--map_kurator_system_dir /home/mapkurator-system/
--input_dir_path /home/root/bindmount/input/
--expt_name mapKurator_test
--module_cropping
--module_get_dimension
--module_text_spotting
--text_spotting_model_dir /home/spotter-v2/PALEJUN/
--spotter_model spotter-v2
--spotter_config /home/spotter-v2/PALEJUN/configs/PALEJUN/Finetune/Rumsey_Polygon_Finetune.yaml
--spotter_expt_name test
--module_img_geojson
--output_folder /home/root/bindmount/output/
--gpu_id 0
Things look correct to me. Could you delete all the content in /home/spotter-v2/PALEJUN/build
folder and execute the command again?
I think that did it! The output is looking much better now. Thanks so much for your help with this. I'm excited to try and integrate MapKurator with my own collection.
I was able to follow the "Use mapKurator with Recogito" > Installation steps all the way to running the mapKurator option on an uploaded image file in Recogito, but the mapKurator action fails with this message and Recogito stack trace:
Message
"Processing failed: /home/mapkurator-system/data/test_imgs/sample_output/039106c2-41fc-4340-9247-4678d47ea886_annotations.json (No such file or directory)"
Recogito application
It appears the "stitch" process fails and ultimately no annotations are output.
Any advice for troubleshooting this error?
I'm running the application on a Paperspace ML-in-a-box machine:
OS template => ML-in-a-Box Ubuntu 20.04 Machine => GPU+ $0.45/hr | 8 CPU | 30 GB RAM | Quadro M4000 Region => East Coast (NY2) Disk size => 100GB
These are the libraries installed by default: https://github.com/Paperspace/ml-in-a-box