nexB / source-inspector

Tools to inspect source code and code symbols
0 stars 1 forks source link

Improve xgettext handlings #16

Closed pombredanne closed 3 months ago

pombredanne commented 3 months ago

This PR addresses these issues:

@armijnhemel FYI since you reported these!

pombredanne commented 3 months ago

@keshav-space can you help me there?

  1. we should test if a file is source code before running any of ctags and xgettext. There is https://github.com/nexB/typecode/blob/92feb7be3a87c1b541e7034c3f9797c96bc52305/src/typecode/contenttype.py#L683 for ty=his
  2. we may want to keep the tests expectations flexible for various versions of the backing tools as they all seem to behave differently.
keshav-space commented 3 months ago

@keshav-space can you help me there?

  1. we should test if a file is source code before running any of ctags and xgettext. There is https://github.com/nexB/typecode/blob/92feb7be3a87c1b541e7034c3f9797c96bc52305/src/typecode/contenttype.py#L683 for ty=his
  2. we may want to keep the tests expectations flexible for various versions of the backing tools as they all seem to behave differently.

Ack, will add the necessary change in this PR.

armijnhemel commented 2 months ago

So what about file names that have a space in it, or perhaps have a semicolon? Those are valid file names and xgettext will happily process them and you will get something like this (I just renamed a file and then ran xgettext):

#: fdisk:big.c:234 fdisk:big.c:2942

I am not sure how common file names with semicolons are, but file names with spaces are not uncommon.

As far as I can tell partition() will not properly process this:

>>> _, _, bla = a.partition('#:')
>>> bla.partition(':')
(' fdisk', ':', 'big.c:234 fdisk:big.c:2942')
armijnhemel commented 2 months ago

So what about file names that have a space in it, or perhaps have a semicolon? Those are valid file names and xgettext will happily process them and you will get something like this (I just renamed a file and then ran xgettext):

#: fdisk:big.c:234 fdisk:big.c:2942

I am not sure how common file names with semicolons are, but file names with spaces are not uncommon.

As far as I can tell partition() will not properly process this:

>>> _, _, bla = a.partition('#:')
>>> bla.partition(':')
(' fdisk', ':', 'big.c:234 fdisk:big.c:2942')

thinking a bit more: the only certainty that you have is that the part after the last : should be a number, so perhaps rewriting it to rpartition() would be the cleanest way.