Closed bertsky closed 1 year ago
@vahidrezanezhad could it be that the way the layers get defined in sbb_pixelwise_segmentation should use input_shape
as kwarg instead of data_format
?
EDIT: sorry, that was a bullshit question
Ah, here's the reason:
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), *NOT* tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
So I must do the model conversion on Python <= 3.7 but TF >= 2.5 ...
Ok, it does work.
I can also update ocrd-tool.json here to include the changed resource URL. But first I need some place to upload it permanently. How about qurator-data.de?
(@kba knows where to find the new archive...)
@bertsky Thank you so much. If you can upload the saved model somewhere shareable, we can publish it to qurator-data. (edit: will ask @kba for the location)
btw sorry for being unresponsive here, we are currently fighting the ICDAR2023 submission deadline for a paper on...Eynollah ;)
BTW, CI failure is because there is no TF2 for Py 3.6 anymore. Config in GH Actions should be changed (back) to 3.7 or 3.8.
BTW, CI failure is because there is no TF2 for Py 3.6 anymore. Config in GH Actions should be changed (back) to 3.7 or 3.8.
This is what I tried in https://github.com/qurator-spk/eynollah/pull/90.
@kba asked me to upload the updated model, but 1. I'd prefer @vahidrezanezhad to do it 2. that he produces the updated models himself. Is this just about - skims the issue comments - being able to load the files with a newer Python/TF version? Then I could maybe be persuaded to do it.
Not trying to be a pedant, it's just professional diligence before I upload some variant of some model I don't fully understand myself.
(Alternatively, I offered @kba access to the relevant resources (our internal git annex + qurator-data.de), then the diligence is up to him.)
Skimming the PR, it's not only about loading the model, but some other stuff as well? (Cleaning up and a change of a larger code part)
Skimming the PR, it's not only about loading the model, but some other stuff as well? (Cleaning up and a change of a larger code part)
Alright, I had a look at the individual commits.
a. I would upload the data if necessary, b. reviewing, merging this PR must be done by someone else, there is too much going on
@mikegerber
Is this just about - skims the issue comments - being able to load the files with a newer Python/TF version?
Initially, yes. But then it turned out there are more incompatibilities (newer Numpy). In order to be able to do the Numpy fixes, I had to understand some of the code, which in turn led to the cosmetic simplifications. The last commit returned to TF2 migration – there is no session management anymore, and no benefit in keeping this IMO. Especially reloading the models for every page seemed unnecessarily inefficient. (And yes, I did test, of course.)
Not trying to be a pedant, it's just professional diligence before I upload some variant of some model I don't fully understand myself.
Sure – I would also recommend to first allow @vahidrezanezhad to take a look, even if the changes are more or less cosmetic.
Regarding model upload: if you could do that first, then I could update this PR accordingly – adding the new URL to the local ocrd-tool.json, so it would immediately become available as the current model location (independent of old URLs registered in core).
(I really just loaded as HDF5 and stored as SavedFormat, nothing fancy.)
(And yes, I did test, of course.)
;-) As I will only be uploading the files, that's not my concern. (I would try to test myself if it was - had the tiniest change break something unexpected too often.)
Regarding model upload: if you could do that first, then I could update this PR accordingly – adding the new URL to the local ocrd-tool.json, so it would immediately become available as the current model location (independent of old URLs registered in core). (I really just loaded as HDF5 and stored as SavedFormat, nothing fancy.)
Yeah, I think @kba will give me the files and I'll upload them (or I give him access.)
Sure – I would also recommend to first allow @vahidrezanezhad to take a look, even if the changes are more or less cosmetic.
He's on vacation btw.
@cneud wants to discuss this with @kba (and me, i think) at our regular meeting, which is understandable.
It would be great if we can put the TF saved model versions of the models on qurator-data already (and possibly HF later as well), for testing. About the other changes and the PR, I would like to have some discussion to see if we can review/merge while Vahid is away.
So
a. I would upload the data if necessary,
Yes please.
b. reviewing, merging this PR must be done by someone else, there is too much going on
To be discussed.
a. I would upload the data if necessary, Yes please.
Still waiting for the data :) @kba, @bertsky?
b. reviewing, merging this PR must be done by someone else, there is too much going on To be discussed.
Yes, that's 100% "in meinem Sinne"/my opinion. I'd rather be not involved, as I am currently not familiar with the code base.
Still waiting for the data :) @kba, @bertsky?
Fixes #92 Fixes #87
https://qurator-data.de/eynollah/2021-04-25/SavedModel.tar.gz
I've created a short README, to document how this file was created, see https://qurator-data.de/eynollah/2021-04-25/. @bertsky Could you post a short code snippet here, how you converted this? Could be useful in the future.
@joergleh Relevant to your efforts, probably.
Many thanks @mikegerber!
@bertsky This should be the location for the model(s), rather than https://ocr-d.kba.cloud/2021-04-25.SavedModel.tar.gz, which was just for temporary file transfer, so https://github.com/qurator-spk/eynollah/pull/91/commits/00f431a5d6152aba96675b4f26d89f44bfdef211 would need to be adapted again.
@mikegerber it boils down to:
import os
from tensorflow.keras.models import load_model
os.chdir('default')
for path in os.listdir():
if not path.endswith('.h5'):
continue
model = load_model(path, compile=False)
model.save(path[:-3], save_format="tf")
@cneud done!
Wait! I just found out my "cosmetic" changes are not so cosmetic after all … investigating.
@bertsky Thanks!
Where does the default
directory name come from? I was wondering the same while examining @kba's tar. Is it OCR-D convention?
@bertsky Sorry for the laughing emoji :) Couldn't resist!
@mikegerber
Where does the
default
directory name come from? I was wondering the same while examining @kba's tar. Is it OCR-D convention?
That's just the model name (i.e. last directory inside the archive = directory to unpack). Put your absolute or relative path there.
Sorry for the laughing emoji :) Couldn't resist!
No worries. I shouldn't have been so bold to begin with!
I just realized that the conversion .h5 -> TF SavedModel was done using the models from https://qurator-data.de/eynollah/2021-04-25/ where rather the models from https://qurator-data.de/eynollah/2022-04-05/ should have been used.
Other than having more comprehensible model names, the 2022-04-05
dir also contains an additional model eynollah-main-regions_20220314.h5
which is required for full functionality.
@cneud,
I just realized that the conversion .h5 -> TF SavedModel was done using the models from https://qurator-data.de/eynollah/2021-04-25/ where rather the models from https://qurator-data.de/eynollah/2022-04-05/ should have been used.
Other than having more comprehensible model names, the
2022-04-05
dir also contains an additional modeleynollah-main-regions_20220314.h5
which is required for full functionality.
you should know better than me, but as of Vahid's earlier response I think this only belongs to the (now public) eynollah_light
branch.
I can convert these models as well, if you like, but this must be done similarly against that branch.
I see (the mess...).
But since that eynollah_light
branch is "approved" despite merge conflicts, that is sth we likely have to keep waiting until Vahid is back. :confused:
Wait! I just found out my "cosmetic" changes are not so cosmetic after all … investigating.
I removed these parts, rebased and force-pushed. (Sorry for the noise!)
But since that
eynollah_light
branch is "approved" despite merge conflicts, that is sth we likely have to keep waiting until Vahid is back.
I have converted these models as well and placed them in the same location for @kba to upload. I suggest merging master into eynollah_light
after this has been merged into master (as the easiest path AFAICS).
But since that
eynollah_light
branch is "approved" despite merge conflicts, that is sth we likely have to keep waiting until Vahid is back.I have converted these models as well and placed them in the same location for @kba to upload. I suggest merging master into
eynollah_light
after this has been merged into master (as the easiest path AFAICS).
Available from https://ocr-d.kba.cloud/2022-04-05.SavedModel.tar.gz
I refuse to upload 2022-04-05.SavedModel.tar.gz as the original is not fully in our internal data repository, sorry. This must be corrected by Vahid first.
Here, I am pedantic because it is not the first time this kind of situation comes up and while I am happy to support, I don't want to clean up all the time.
Take your time! This PR can wait, and even more so any further action on eynollah_light. (I'll shut up now.)
No worries. I shouldn't have been so bold to begin with!
I appreciate the boldness, I just think you should have been bold in another PR ;) I know it's more work but for me, while reviewing a PR like this (even if it's just to understand the upload situation), it's more work if the PR contains more than just the changes necessary.
However, this is not my project, so it's not up to me, actually.
Side note: This PR seems to fix the CircleCI tests, it that correct? Great stuff if this is true!
Side note: This PR seems to fix the CircleCI tests, it that correct? Great stuff if this is true!
No. This PR only fixes Python 3.8 onwards. The CI check comes with @cneud's config changes (throwing out broken 3.6, activating 3.7 onwards).
In trying to solve #87, I have converted the models and changed the loader by trying to omit the .h5 suffix.
Unfortunately, I now get a new error:
Given that this seems to be the output layer, I am at a loss here. Creating as draft only.
Ideas anyone?