python-babel / babel

The official repository for Babel, the Python Internationalization Library
http://babel.pocoo.org/
BSD 3-Clause "New" or "Revised" License
1.34k stars 448 forks source link

List index out of range error in PoFileParser._add_message #1134

Closed gabe-sherman closed 1 month ago

gabe-sherman commented 2 months ago

The below code triggers a list index out of range error when provided a malformed input. This occurs in _add_message at line 235 in babel/messages/pofile.py.

import sys
import io
import babel.messages.pofile
import babel

def main():
    input = io.StringIO(open(sys.argv[1], "r").read())
    babel.messages.pofile.read_po(input)

if __name__ == "__main__":
    main()

Version

babel version 2.16.0

POC File

https://github.com/FuturesLab/POC/blob/main/babel/poc-01

How to trigger:

python filename.py poc-01

Trace report

Traceback (most recent call last):
  File "rep.py", line 11, in <module>
    main()
  File "rep.py", line 7, in main
    babel.messages.pofile.read_po(input)
  File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 434, in read_po
    parser.parse(fileobj)
  File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 361, in parse
    self._finish_current_message()
  File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 250, in _finish_current_message
    self._add_message()
  File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 235, in _add_message
    string = self.translations[0][1].denormalize()
IndexError: list index out of range
tomasr8 commented 2 months ago

Here's a smaller reproducer:

import io
from babel.messages.pofile import read_po

po = io.StringIO('msgid ""')
read_po(po)

Neither is a valid PO file so we should ideally raise a PoFileError in this case. @gabe-sherman would you like to open a PR?

gabe-sherman commented 2 months ago

Hey Tomas! Thanks for the response. I don't have a deep understanding of the way the PO files should be processed so a full PR may be challenging, but if I'm able to get some time I'll certainly check it out :)

tomasr8 commented 2 months ago

I think checking if self.translations is empty inside _finish_current_message() and raising a PoFileError if that's the case might work. In any case, feel free to ask if you need help :)

gabe-sherman commented 2 months ago

Sounds good, thanks! Should the process exit when this is detected or should we only raise the warning and leave the exception raising up to the value of self.abort_invalid.

tomasr8 commented 2 months ago

Right, there's this abort_invalid flag. I guess if it's set to True, we should just raise, and if not maybe we could insert a dummy translation, something like (0, '') (and emit a warning)?

gabe-sherman commented 2 months ago

Is the invariant here that if the messages list has an element in it, the translations list should also be populated? Adding in this check at the beginning of _finish_current_message triggers this new warning on valid files in cases where both self.translations and self.messages are empty. However, when this check to the size of self.translations is moved inside the if self.messages: block it behaves as expected, although I haven't done robust testing yet. I just want to make sure we're not triggering this exception in cases where it's okay that the translations list is empty.

tomasr8 commented 2 months ago

To be honest, I'm not that familiar with the parser code either, but I think you are right. We should only raise/warn when we call _finish_current_message while self.messages is not empty and self.translations is, i.e. something like this:

def _finish_current_message(self) -> None:
    if self.messages:
        if not self.translations:
            # Handle error here
        self._add_message()
gabe-sherman commented 2 months ago

Yea that's what I was thinking as well :). I'll do some testing on it.

tomasr8 commented 1 month ago

I think this can be closed now :)