uclnlp / jack

Jack the Reader
MIT License
257 stars 82 forks source link

Caching preprocessing in online input module #348

Closed dirkweissenborn closed 6 years ago

dirkweissenborn commented 6 years ago

The online input module used to first preprocess and then start training. To be "online" it should however preprocess on the fly s.t. training can start immediately (this is just nicer from a user perspective).

Of course preprocessing is cached for epochs after the first. For large datasets we cache preprocessing on file to not run into memory issues.

will also resolve #342

codecov-io commented 6 years ago

Codecov Report

Merging #348 into master will increase coverage by 0.22%. The diff coverage is 64.48%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #348      +/-   ##
==========================================
+ Coverage    48.7%   48.93%   +0.22%     
==========================================
  Files          88       88              
  Lines        5061     5121      +60     
==========================================
+ Hits         2465     2506      +41     
- Misses       2596     2615      +19
Impacted Files Coverage Δ
...aders/extractive_qa/tensorflow/modular_qa_model.py 0% <0%> (ø) :arrow_up:
jack/core/shared_resources.py 100% <100%> (ø) :arrow_up:
jack/tfutil/xqa.py 93.33% <100%> (ø) :arrow_up:
jack/train_reader.py 50.8% <100%> (+0.39%) :arrow_up:
jack/readers/extractive_qa/shared.py 86.04% <26.66%> (+4.37%) :arrow_up:
jack/util/preprocessing.py 66.38% <30.76%> (-4.37%) :arrow_down:
jack/io/load.py 82.6% <33.33%> (ø) :arrow_up:
jack/core/data_structures.py 81.48% <58.33%> (-18.52%) :arrow_down:
jack/core/input_module.py 86.07% <87.5%> (-1.43%) :arrow_down:
jack/readers/multiple_choice/shared.py 82.23% <0%> (-1.98%) :arrow_down:
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a045085...d88ee97. Read the comment docs.