Closed bertsky closed 1 year ago
Do you want me to add some CI BTW?
This was something I wanted to check with the repacked archive before merging, since I am running into a dir/subdir problem also with Eynollah and the repacked models. I only just now learned about "path_in_archive"
!
I was surprised there was no CI here after wrestling with it over there, so yeah, anything in that regard is welcome.
We should have long gotten rid of that subrepo for the test...
We should have long gotten rid of that subrepo for the test...
Not sure about that. I do find the subrepo pattern useful, but tests must not use the clone directly.
I initially did not see there's already a CircleCI so I set up a GH Action. Then I also updated the CircleCI config (renaming its worklows). Now it seems both succeed, only the old CircleCI config (identified by the old workflow name) – presumably on master – does not.
Much appreciate the thoroughness!
Fixes #55 Fixes #50 Fixes #58
Many thanks! Will also test this out locally in a bit.
Btw I managed to get rid of the CircleCi tests getting killed for Eynollah by switching from the docker executor to a linux vm executor (which supposedly has more RAM).
Locally on Python 3.8 the test fails for me, I am investigating.
Btw I think only the newer 2021-03-09
model should be downloaded and installed by default, as this saves ~500MB.
(sbb_binarization-1): $ make test ocrd resmgr download ocrd-sbb-binarize "" 20:12:23.858 INFO ocrd.cli.resmgr - Downloading registered resource 'default-2021-03-09' (https://github.com/qurator-spk/sbb_binarization/releases/download/v0.0.11/saved_model_2021_03_09.zip) [------------------------------------] 0% 20:12:26.515 INFO ocrd.resource_manager._download_impl - Downloading https://github.com/qurator-spk/sbb_binarization/releases/download/v0.0.11/saved_model_2021_03_09.zip to download.tar.xx [####################################] 100%
20:12:37.520 INFO ocrd.resource_manager.download - Extracting application/zip archive to /tmp/tmpxr178zws/out 20:12:38.566 INFO ocrd.resource_manager.download - Copying '.' from archive to /home/cnd/.local/share/ocrd-resources/ocrd-sbb-binarize/default-2021-03-09 20:12:38.707 INFO ocrd.cli.resmgr - Installed resource https://github.com/qurator-spk/sbb_binarization/releases/download/v0.0.11/saved_model_2021_03_09.zip under /home/cnd/.local/share/ocrd-resources/ocrd-sbb-binarize/default-2021-03-09 20:12:38.707 INFO ocrd.cli.resmgr - Use in parameters as 'default-2021-03-09' 20:12:38.707 INFO ocrd.cli.resmgr - Downloading registered resource 'default' (https://github.com/qurator-spk/sbb_binarization/releases/download/v0.0.11/saved_model_2020_01_16.zip) [------------------------------------] 0% 20:12:38.708 INFO ocrd.resource_manager._download_impl - Downloading https://github.com/qurator-spk/sbb_binarization/releases/download/v0.0.11/saved_model_2020_01_16.zip to download.tar.xx [####################################] 100%
20:13:23.277 INFO ocrd.resource_manager.download - Extracting application/zip archive to /tmp/tmpfih8exg7/out 20:13:26.987 INFO ocrd.resource_manager.download - Copying 'saved_model_2020_01_16' from archive to /home/cnd/.local/share/ocrd-resources/ocrd-sbb-binarize/default 20:13:27.496 INFO ocrd.cli.resmgr - Installed resource https://github.com/qurator-spk/sbb_binarization/releases/download/v0.0.11/saved_model_2020_01_16.zip under /home/cnd/.local/share/ocrd-resources/ocrd-sbb-binarize/default 20:13:27.496 INFO ocrd.cli.resmgr - Use in parameters as 'default' ocrd-sbb-binarize -m test/assets/kant_aufklaerung_1784/data/mets.xml -I OCR-D-IMG -O BIN -P model default Traceback (most recent call last): File "/home/cnd/tmp/sbb_binarization-1/bin/ocrd-sbb-binarize", line 11, inload_entry_point('sbb-binarization', 'console_scripts', 'ocrd-sbb-binarize')() File "/home/cnd/tmp/sbb_binarization-1/lib/python3.8/site-packages/click/core.py", line 1130, in call return self.main( args, kwargs) File "/home/cnd/tmp/sbb_binarization-1/lib/python3.8/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/cnd/tmp/sbb_binarization-1/lib/python3.8/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/cnd/tmp/sbb_binarization-1/lib/python3.8/site-packages/click/core.py", line 760, in invoke return __callback(*args, kwargs) File "/home/cnd/git/sbb_binarization/sbb_binarize/ocrd_cli.py", line 162, in cli return ocrd_cli_wrap_processor(SbbBinarizeProcessor, *args, *kwargs) File "/home/cnd/tmp/sbb_binarization-1/lib/python3.8/site-packages/ocrd/decorators/init.py", line 68, in ocrd_cli_wrap_processor workspace = resolver.workspace_from_url(mets, working_dir) File "/home/cnd/tmp/sbb_binarization-1/lib/python3.8/site-packages/ocrd/resolver.py", line 166, in workspace_from_url self.download_to_directory(dst_dir, mets_url, basename=mets_basename, if_exists='overwrite' if clobber_mets else 'skip') File "/home/cnd/tmp/sbb_binarization-1/lib/python3.8/site-packages/ocrd/resolver.py", line 82, in download_to_directory raise FileNotFoundError("File path passed as 'url' to download_to_directory does not exist: %s" % url) FileNotFoundError: File path passed as 'url' to download_to_directory does not exist: /home/cnd/git/sbb_binarization/test/assets/kant_aufklaerung_1784/data/mets.xml make: [Makefile:38: test] Error 1
Locally on Python 3.8 the test fails for me, I am investigating.
judging by your log snippet I would say that what's missing is a git submodule update --init
prior to the make test
. I guess you already had a directory repo/assets
but no checkout. (Same is true on the CI, hence the extra git
commands there.)
Try:
make clean
git submodule update --init
make test
Btw I think only the newer
2021-03-09
model should be downloaded and installed by default, as this saves ~500MB.
Sure, I just thought the test should be thorough (testing both the simple and the multi-scale model). The models are cached on CI but that's no different than downloading the files from a fast server.
Feel free to change the tests accordingly, or let me know if I should do it.
OK, I was using only the Makefile and did
make install make models make repo/assets make test/assets make test
I was missing a git submodule update --init
. Now all tests complete. Shouldn't the submodule init also be included in the Makefile targets?
Ah, nice timing!
Shouldn't the submodule init also be included in the Makefile targets?
It is – but it only depends on repo/assets
. If that exists, but is empty, then nothing happens.
Perfect timing ;)
I just thought the test should be thorough (testing both the simple and the multi-scale model)
Agreed but since a new better hybrid model is already here, we should also think about phasing out supporting older ones - at some point.
If that exists, but is empty, then nothing happens.
I started with a fresh venv, cloned the repo and checked out your change-model-url
branch, then ran the steps above and would have assumed to have a working test. I only found out about the git submodule update --init
from the CI.
Edit: I did run make repo/assets
and make test/assets
, so what was going on here?
Found this in my history, my fault
make test cp -r -t test/assets repo/assets/data/ cp: cannot stat 'repo/assets/data/': No such file or directory make: *** [Makefile:33: test/assets] Error 1
I did run test once before make repo/assets
.
I started with a fresh venv, cloned the repo and checked out your
change-model-url
branch, then ran the steps above and would have assumed to have a working test. I only found out about thegit submodule update --init
from the CI.Edit: I did run
make repo/assets
andmake test/assets
, so what was going on here?
Agreed, it should have just worked. Turns out that a repo with submodules always yields empty subdirectories unless you clone recursively. I fixed it by making the rule depend on a subdirectory (part of assets). That also allows simplifying the CI rules.
@cneud you might wanna do a new release (for the changes under unreleased) on Github (not the models, only the code) and PyPI.
do a new release
Yes, asap!
Btw I managed to get rid of the CircleCi tests getting killed for Eynollah by switching from the docker executor to a linux vm executor (which supposedly has more RAM).
You mean this change? According to the docs this allows for 15 GB instead of just 8 GB RAM on the free tier. Should we give it a try here (on master)? (Will probably have to install Python first then.)
Yes, with this to set the Python version. Based on the docs for machine executors, the medium linux-vm has 7.5 GB, which was sufficient to make the tests for eynollah complete.
do a new release
https://github.com/qurator-spk/sbb_binarization/releases/tag/v0.1.0 with thanks to @kba!
Yes, with this to set the Python version.
ah, understood!
Based on the docs for machine executors, the medium linux-vm has 7.5 GB, which was sufficient to make the tests for eynollah complete.
Ok, or we first just try to set resource_class: large
on the docker executor – perhaps that's already enough?
Can we use large
with a free account? I guess we can try...
That does require double the credits (20 credits/min) used compared to the Linux VM (10 credits/min) though.
Indeed!
@cneud I also updated the second URL for the older, 4-fold model. Nearly tripped myself over the subdirectory problem (loader needs a subdirectory, and because the old 4-fold model already contains submodel directories, no problem, but the newer does not, so
path_in_archive
must be just.
).