inception-project / inception-reporting-dashboard

This package generate plots for your INCEpTION project to visualize the project progress.
Apache License 2.0
2 stars 4 forks source link

Dashboard Plot fails, if not all documents are annotated / opened #54

Closed hunnguye closed 1 month ago

hunnguye commented 2 months ago

Describe the bug The dashboard is not able to be generated, through either manual or api import

To Reproduce Steps to reproduce the behavior:

  1. In inception import the project inception-gemtex-deid-base_project-grascco_raw.zip
  2. For simplicity delete all documents except for two
  3. Start annotating one document, leaving the second unopened
  4. Try to export the project into the dashboard (manual or api)

Expected behavior A dashboard is created

Error message KeyError

File "/usr/local/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 85, in exec_func_with_error_handling
    result = func()
             ^^^^^^
File "/usr/local/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 576, in code_to_exec
    exec(code, module.__dict__)
File "/usr/local/lib/python3.11/site-packages/inception_reports/generate_reports_manager.py", line 663, in <module>
    main()
File "/usr/local/lib/python3.11/site-packages/inception_reports/generate_reports_manager.py", line 659, in main
    plot_project_progress(project)
File "/usr/local/lib/python3.11/site-packages/inception_reports/generate_reports_manager.py", line 489, in plot_project_progress
    doc_token_categories[state] += type_counts["Token"]["documents"][

Please complete the following information:

Additional context If you start annotating the document, and reimport the project, the error disappears

hunnguye commented 2 months ago

The error message hints towards an unexisting keyerror.

Troubleshooting

https://github.com/inception-project/inception-reporting-dashboard/blob/c71b9ebb706db16261fe9672c37a004336ca7c98/inception_reports/generate_reports_manager.py#L482-L487

This codeblock iterates through each document and uses the document name to look up a value in nested dictionary the type_counts. However, it probably does not find the key, hence the error message.

But why is that so? PR #53 added logging capabilities and also outputs the typ_counts dictionary when it is generated.

The below screenshot was taken after following the instructions described in this issue: image

It appears, that within the dictionary, the corrresponding nested dictionary "Token.documents" , only contains documents, which have been opened (?) once and does not consider other documents. I can only guess, that it has something to do with how the exportedproject.json is generated, from which, I assume, the tokens are counted.

This however can create the situation, where the number of real documents might be greater than the number of "reported" documents in the type_counts dictionary, hence the key error

serwarde commented 1 month ago

The problem was checking for the json files (corresponding to each annotator) containing the annotations. For that, we looking for all json files in a document annotation folder EXCEPT for the INITIAL_CAS.json. However, if the document hasn't been opened before, then it won't have other json files in the folder, and thus it gets skipped.

I added a failover to the INTIAL_CAS.json file in case there are no others. The problem should be solved now.