tberg12 / ocular

Ocular is a state-of-the-art historical OCR system.
GNU General Public License v3.0
250 stars 48 forks source link

Using the option -allowedFontsPath does not work as expected, please assist #15

Open DutchPirate1966 opened 3 years ago

DutchPirate1966 commented 3 years ago

Hi there, I am experimenting with Ocular to train an OCR model for 18th century dutch print. The option to initialize a font using only few of the installed fonts on my computer is interesting and I would like to try it. But in my case including the option: -allowFontsPath font_refs/dutch18thCE_fonts.txt does not seem to be doing anything. I attached the dutch18thCE_fonts.txt file here. Is it wrong? Thanks for any feedback on this Best, Marco (the Netherlands)

dutch18thCE_fonts.txt

tberg12 commented 3 years ago

Hi Marco,

What happens when you use that allowed fonts file? Are you seeing the system use other fonts (i.e. fonts other than Domincan) when constructing the initial template parameters?

On Thu, Oct 29, 2020 at 5:55 AM Marco Roling notifications@github.com wrote:

Hi there, I am experimenting with Ocular to train an OCR model for 18th century dutch print. The option to initialize a font using only few of the installed fonts on my computer is interesting and I would like to try it. But in my case including the option: -allowFontsPath font_refs/dutch18thCE_fonts.txt does not seem to be doing anything. I attached the dutch18thCE_fonts.txt file here. Is it wrong? Thanks for any feedback on this Best, Marco (the Netherlands)

dutch18thCE_fonts.txt https://urldefense.com/v3/__https://github.com/tberg12/ocular/files/5458404/dutch18thCE_fonts.txt__;!!Mih3wA!Tpn6kwvW6efj-odxXUGOnayGtdyKUK0RRp_TyaA4dLxtyEEMvXBXknvxKv_G4w$

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/tberg12/ocular/issues/15__;!!Mih3wA!Tpn6kwvW6efj-odxXUGOnayGtdyKUK0RRp_TyaA4dLxtyEEMvXBXknvVCSgS6A$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABP3YFBN622G6N2W7ECGAHTSNFQ27ANCNFSM4TDWEDAA__;!!Mih3wA!Tpn6kwvW6efj-odxXUGOnayGtdyKUK0RRp_TyaA4dLxtyEEMvXBXkntOcrcPcg$ .

DutchPirate1966 commented 3 years ago

Thanks so much for your response! Here is my console output (see attached file) when using the command with the option. It takes only seconds to finish, which is a little suspicious. Because when not using the option, it takes one and a half hours of processing,

When I do not use the -allowedFontsPath option I get to see the probs matrices for all chars in the console output. In case I use the option then these probs matrices seem empty. It confuses me as you will understand.

Console_output_DominicanRun.txt

tberg12 commented 3 years ago

I think you're filter is effectively ruling out all fonts on you system, so the model is initializing with no fonts at all. Are you sure you have a font whose exact name is "Dominican" on your system?

On Thu, Oct 29, 2020 at 9:06 AM Marco Roling notifications@github.com wrote:

Thanks so much for your response! Here is my console output (see attached file) when using the command with the option. It takes only seconds to finish, which is a little suspicious. Because when not using the option, it takes one and a half hours of processing,

When I do not use the -allowedFontsPath option I get to see the probs matrices for all chars in the console output. In case I use the option then these probs matrices seem empty. It confuses me as you will understand.

Console_output_DominicanRun.txt https://urldefense.com/v3/__https://github.com/tberg12/ocular/files/5459689/Console_output_DominicanRun.txt__;!!Mih3wA!St5uKwiiY47T9v4fsZvFKm5wkFxsnX09bJmFkrmdzV5FutENqArodTySRXwk4Q$

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/tberg12/ocular/issues/15*issuecomment-718854707__;Iw!!Mih3wA!St5uKwiiY47T9v4fsZvFKm5wkFxsnX09bJmFkrmdzV5FutENqArodTyIEFursQ$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABP3YFGKG2NNYNE72L42V53SNGHHNANCNFSM4TDWEDAA__;!!Mih3wA!St5uKwiiY47T9v4fsZvFKm5wkFxsnX09bJmFkrmdzV5FutENqArodTwrr6sEFg$ .

DutchPirate1966 commented 3 years ago

Well I have checked the fonts, and the ttf files are present in two directories: C:\Users\Acer\AppData\Local\Microsoft\Windows\Fonts\ c:\Windows\fonts\ Maybe I should give the full path name in the fonts path txt file? Screen Shot 10-30-20 at 12 02 PM

tberg12 commented 3 years ago

I would try the following: Add a println to the font names that the system loads when you don't provide the filter file (i.e. don't use -allowedFontPaths). This will let you see what Java thinks the names of all the fonts are. From that list, pick out dominican, and copy the naming convention (possibly the full path) that Java uses. Hopefully that will work! Let me know how it goes!

On Fri, Oct 30, 2020 at 4:05 AM Marco Roling notifications@github.com wrote:

Well I have checked the fonts, and the ttf files are present in two directories: C:\Users\Acer\AppData\Local\Microsoft\Windows\Fonts c:\Windows\fonts Maybe I should give the full path name in the fonts path txt file? [image: Screen Shot 10-30-20 at 12 02 PM] https://urldefense.com/v3/__https://user-images.githubusercontent.com/36985551/97697982-18edd600-1aa8-11eb-921f-a9185f4d479f.PNG__;!!Mih3wA!UQddL-jUiEJCq_l_SBsoxJ6t60q0CuKU_AVqDOv8rOkUpyN3DnN1detMm1v4Ow$

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/tberg12/ocular/issues/15*issuecomment-719488680__;Iw!!Mih3wA!UQddL-jUiEJCq_l_SBsoxJ6t60q0CuKU_AVqDOv8rOkUpyN3DnN1desjxGrLww$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABP3YFBSXYL436UQMTMH4RTSNKMXHANCNFSM4TDWEDAA__;!!Mih3wA!UQddL-jUiEJCq_l_SBsoxJ6t60q0CuKU_AVqDOv8rOkUpyN3DnN1det0KARQ3w$ .

DutchPirate1966 commented 3 years ago

Entering the programming realm right? ;-) I will give it a try (java fairly new to me actually) Will keep you posted.

tberg12 commented 3 years ago

Yeah, you'd have to add a line or two of Java. You could try adding the full path name as you mentioned first

On Mon, Nov 2, 2020 at 8:14 AM Marco Roling notifications@github.com wrote:

Entering the programming realm right? ;-) I will give it a try (java fairly new to me actually) Will keep you posted.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/tberg12/ocular/issues/15*issuecomment-720570120__;Iw!!Mih3wA!WN-ZoWLk3I8VXwQZEIVdYmGkB-J9evyPG-IFHVxa__quKjF2eaJaDFPsHmqnRw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABP3YFEXAS2CH7H6IJCDSHLSN3LFVANCNFSM4TDWEDAA__;!!Mih3wA!WN-ZoWLk3I8VXwQZEIVdYmGkB-J9evyPG-IFHVxa__quKjF2eaJaDFPTBJqsLw$ .

tberg12 commented 3 years ago

Actually, doesn't the code already print out the font names? Try running it without -allowedFontsPath and then inspect the log... font names might already be there

On Mon, Nov 2, 2020 at 8:25 AM Taylor Berg-Kirkpatrick tberg@eng.ucsd.edu wrote:

Yeah, you'd have to add a line or two of Java. You could try adding the full path name as you mentioned first

On Mon, Nov 2, 2020 at 8:14 AM Marco Roling notifications@github.com wrote:

Entering the programming realm right? ;-) I will give it a try (java fairly new to me actually) Will keep you posted.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/tberg12/ocular/issues/15*issuecomment-720570120__;Iw!!Mih3wA!WN-ZoWLk3I8VXwQZEIVdYmGkB-J9evyPG-IFHVxa__quKjF2eaJaDFPsHmqnRw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABP3YFEXAS2CH7H6IJCDSHLSN3LFVANCNFSM4TDWEDAA__;!!Mih3wA!WN-ZoWLk3I8VXwQZEIVdYmGkB-J9evyPG-IFHVxa__quKjF2eaJaDFPTBJqsLw$ .

DutchPirate1966 commented 3 years ago

Not really. And I got submerged in decompiling the jar and trying to figure out where and how this option -allowFontsPath is actually used. I am not the programmers type of person I guess. Tried to contact Dan Guerrette about it, awaiting his response.