mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
750 stars 131 forks source link

Path split does not work with split containing `./` and `../` #433

Closed PonteIneptique closed 1 year ago

PonteIneptique commented 1 year ago

The current code will throw an error if it is used with something like ../image.png

https://github.com/mittagessen/kraken/blob/911fcac6d5c02a34fcafa413becb327f889c16bf/kraken/lib/functional_im_transforms.py#L94-L95

I'd recommend:

os.path.join(
    os.path.dirname(str(file)),
    os.path.basename(str(x)).split(extsep, 1)[0]
)
mittagessen commented 1 year ago

I was going to move to pathlib anyway. In fact, that's the one file I forgot to move over when fixing up the typing a couple of days ago.

BTW: That code is fundamentally broken. It will happily remove non-suffix components of file names if they've got extseps in there. The primary reason for this behavior is that the original ocropus has this behavior and we didn't want to break existing datasets.

PonteIneptique commented 1 year ago

I think, looking at where you use this function, doing .replace(".gt.txt", ".png") or the opposite would work at least as well...