translate / translate

Useful localization tools with Python API for building localization & translation systems
https://toolkit.translatehouse.org/
GNU General Public License v2.0
858 stars 313 forks source link

Confusing parse methods and constructors #3911

Open nijel opened 5 years ago

nijel commented 5 years ago

Currently there are several approaches to construct storage instance:

This leads to lot of code duplication where each storage behaves slightly differently (some taking file name from name some also from filename, some closing the file, some seeking to start).

Some examples of parse implementations:

https://github.com/translate/translate/blob/f165a8122b5de37227208762323f954bf5e5d8dd/translate/storage/lisa.py#L328-L333

https://github.com/translate/translate/blob/f165a8122b5de37227208762323f954bf5e5d8dd/translate/storage/trados.py#L204-L211

https://github.com/translate/translate/blob/f165a8122b5de37227208762323f954bf5e5d8dd/translate/storage/pypo.py#L816-L821

https://github.com/translate/translate/blob/f165a8122b5de37227208762323f954bf5e5d8dd/translate/storage/aresource.py#L463-L468

Changing this mess can break some usage, that's why I'm opening issue before working on the code.

My proposal to address this:

friedelwolff commented 5 years ago

I haven't looked into this for years, but allow me one comment which might help: I vaguely remember some of the classes having an easier time parsing from a file vs. parsing from bytes for example when using an external library with a file based API. Maybe I'm only thinking of alternative PO implementations, but I think there was the idea to let each class implement it as best it can, possibly calling the other one if necessary. About the init I don't have anything to comment - I think you have more recent exposure.

2019-08-02 13:34 GMT+02:00, Michal Čihař notifications@github.com:

Currently there are several approaches to construct storage instance:

  • parsefile class method takes file object or filename as string
  • parsestring class method takes bytes
  • __init__ generally creates empty object
  • on some storages __init__ accept inputfile parameter and behaves pretty much like parsefile duplicating it's logic
  • the parse method is used by parsestring and sometimes also to implement __init__ loading

This leads to lot of code duplication where each storage behaves slightly differently (some taking file name from name some also from filename, some closing the file, some seeking to start).

Some examples of parse implementations:

https://github.com/translate/translate/blob/f165a8122b5de37227208762323f954bf5e5d8dd/translate/storage/lisa.py#L328-L333

https://github.com/translate/translate/blob/f165a8122b5de37227208762323f954bf5e5d8dd/translate/storage/trados.py#L204-L211

https://github.com/translate/translate/blob/f165a8122b5de37227208762323f954bf5e5d8dd/translate/storage/pypo.py#L816-L821

https://github.com/translate/translate/blob/f165a8122b5de37227208762323f954bf5e5d8dd/translate/storage/aresource.py#L463-L468

Changing this mess can break some usage, that's why I'm opening issue before working on the code.

My proposal to address this:

  • let TranslationStore.__init__ handle inputfile and inputbytes args
  • make parsestring and parsefile just wrapper to invoke __init__ with corresponding arg, possibly deprecating them (parsestring accepting bytes is confusing)
  • the individual storage parse methods will accept only bytes and will do only the actual parsing without any additional logic (this could be breaking change)

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/translate/translate/issues/3911

-- https://fwolff.net.za/

nijel commented 5 years ago

In that case that storage would most likely override parsefile to avoid reading the file by it. However there is only single class which does it and it only does pretty much same as the original parsefile:

https://github.com/translate/translate/blob/f165a8122b5de37227208762323f954bf5e5d8dd/translate/storage/subtitles.py#L123-L143

unho commented 5 years ago

If subtitles is the only one blocking this then I think it is worth working on it.

After all it is not the first time it was considered to drop support for subtitles because it was difficult to support it, for example when we moved to Python 3.

jayvdb commented 4 years ago

pypo.py's:

if not isinstance(input, bytes): 
     input = input.read() 

is currently causing this

Traceback (most recent call last):
  File "manage.py", line 46, in <module>
    main()
  File "manage.py", line 42, in main
    execute_from_command_line(sys.argv)
  File "/usr/lib/python3.8/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
    utility.execute()
  File "/usr/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/lib/python3.8/site-packages/django/core/management/base.py", line 328, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/lib/python3.8/site-packages/django/core/management/base.py", line 369, in execute
    output = self.handle(*args, **options)
  File "/usr/lib/python3.8/site-packages/require_i18n/management/commands/compile_js.py", line 87, in handle
    self._extract()
  File "/usr/lib/python3.8/site-packages/require_i18n/management/commands/compile_js.py", line 98, in _extract
    call_command('extract',
  File "/usr/lib/python3.8/site-packages/django/core/management/__init__.py", line 168, in call_command
    return command.execute(*args, **defaults)
  File "/usr/lib/python3.8/site-packages/django/core/management/base.py", line 369, in execute
    output = self.handle(*args, **options)
  File "/usr/lib/python3.8/site-packages/tower/management/commands/extract.py", line 148, in handle
    catalog = create_pofile_from_babel(extracted)
  File "/usr/lib/python3.8/site-packages/tower/management/commands/extract.py", line 76, in create_pofile_from_babel
    catalog = po.pofile(inputfile="")
  File "/usr/lib/python3.8/site-packages/translate/storage/pypo.py", line 823, in __init__
    super(pofile, self).__init__(inputfile, **kwargs)
  File "/usr/lib/python3.8/site-packages/translate/storage/pocommon.py", line 199, in __init__
    self.parse(inputfile)
  File "/usr/lib/python3.8/site-packages/translate/storage/pypo.py", line 835, in parse
    input = input.read()
AttributeError: 'str' object has no attribute 'read'

The blame can be put onto (unmaintained/forked) Tower, which is using "" as a file, and I can workaround this with inputfile=b"" , however it would be safer for translate to also check for str before assuming a file. However, it looks like that change would be better considered as part of this broader cleanup/unification.