Closed ivyleavedtoadflax closed 4 years ago
Merging #25 into master will not change coverage by
%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #25 +/- ##
=======================================
Coverage 80.94% 80.94%
=======================================
Files 17 17
Lines 1312 1312
=======================================
Hits 1062 1062
Misses 250 250
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 0e6658c...0e6658c. Read the comment docs.
@lizgzil I've made some progress on this today. I'm not sure what caused the mismatch in dims, but it seems to be resolved if using a different config/model (2020.3.19
works ok). This may be because of changes in load_tsv
made in this PR which are not reflected in 2020.3.18
. In any case I think the way that the data were being handled in 2020.3.18
was not optimal - as you point out in https://github.com/wellcometrust/deep_reference_parser/issues/26#issuecomment-607338595.
I would be inclined to forget that model run, and to set the default to be 2020.3.19
for now. It doesn't perform as well as 2020.3.18
, but I think with some experimentation you will be able to get the scores up in a future model run using this new logic, and for now it is important just to get it running.
Summary of what I have done today:
https://github.com/wellcometrust/deep_reference_parser/pull/25/commits/3e1b20b8142040c94bf806c7b50002ea87148550 : Whilst investigating the way that weights are loaded, I realised that the keras_contrib.utils.load_save_utils
is just a soft wrapper around a keras
function. We no longer need the wrapper, so I just use the keras function instead.
https://github.com/wellcometrust/deep_reference_parser/pull/25/commits/20afa75092f882dc927728aff1de7ebe754f973d : Once I realised that the 2020.3.18
model was at fault, the next issue I ran into was that the DeepReferenceParser.predict()
method cannot handle multiple outputs. This commit updates this method to handle a list of outputs, rather than a single output.
https://github.com/wellcometrust/deep_reference_parser/pull/25/commits/77971e5ff937f5f10cb49b6c930cffef1fa517da : This is WIP. This commit updates the split_parse
to use the new outputs which are in a list, and to pretty print them. The ability to write to a file, and to extract a json as we discussed with author, title, year for each reference is still lacking. You can see the results of this commit with the 2020.3.19
config with:
python -m deep_reference_parser split_parse -tc configs/multitask/2020.3.19_multitask.ini "1. Upson, M. A. (2019). Agroforetsry is the best. Journal of Agroforestry. Vol 1. Issue 23. p123-679. 2. Upson, M. A. (2018). Another paper on Agroforestry. Geoderma. 1(3)12-56"
The tests are now failing for the simple parse
and split
cases. This is probably due to changes in the predict method - they probably need a bit of tweaking to ensure they still work for the single task model. Still...some progress!
@ivyleavedtoadflax woop! great the mismatch issue isn't happening :) Classic that you fix something and then that causes the older things to break eh!
So do you think once this PR is closed, then I should train a model using the config for 2020.3.18 (i.e. use adam)? But for now I will just use the 2020.3.19 model if I need to.
Yes, I would train a new model entirely. I've been doing some experimenting, and what is having more of an effect than anything is the sequence length. In this PR I added some logic to give us more fine grained control of sequence length both in data generation and in the model itself, and it had affected performance, so I think we might need to do a bit of experimentation to work out the best settings.
But for now, yes I would just use a model that works, to get the logic working.
Hey @lizgzil spent a little more time on this today:
split
and parse
commands. Essentially we just needed to subset the output from deep_reference_parser.predict
which was now a list of multiple outputs.I think now that tests are passing, I'd be inclined to merge this in even though the split_parse
command is not 100% complete, and then add the final functionality in a new smaller PR. There's a fair bit of refactoring that could be done too, and improving test coverage, which I think are best done in a future PR.
should be
data/multitask
I assume?
:facepalm: yep - will fix
What this PR contains
split_parse
command that follows the same logic as thesplit
andparse
commands.load_tsv
more robust to quotes in input data.indices.pickle
. Note that this change may cause issues when attempting to load earlier (now outdated) model versions.split
,parse
andsplit_parse
commands.NOTE: This version includes changes to both the way that model artefacts are packaged and saved, and the way that data are laded and parsed from tsv files. This results in a significantly faster training time (c.14 hours -> c.0.5 hour), but older models will no longer be compatible. For compatibility you must use multitask models > 2020.3.19, splitting models > 2020.3.6, and parisng models > 2020.3.8. These models currently perform less well than previous versions (#27), but performance is expected to improve with experimentation predominantly around sequence length, and more annotated data.
How you can test it