cpan.py completely wrong about handling dotted-decimal and underscore versions.

kentfredric commented 8 years ago

cpan.py makes some serious mistakes:

1. That any version starting with a v can be used as-is.

This is false: Upstream versions can have less parts than our normalization scheme for perl dictates. Upstream versions leading with a v can also contain underscores, which this code simply does not handle.

2. That any version without a leading v can be parsed as if it was a `x.yyyzzz` notation.

This is also false, as any version upstream which has a minimum of 2 '.' characters is treated the same as if a v had been written on the front, ( see also the first point ).

This also has the incredibly flawed behaviour that it translates:

1.2.3

As if you'd passed:

1.200.300

Which is patently wrong. ( See : http://euscan.gentooexperimental.org/package/dev-perl/PDL/ , which thinks the current versions are 2.400.300 when they should be simply 2.4.3 )

Which leads to the unfortunate side effect that when a person changes their version scheme correctly:

1.2.3 -> 1.003001

The normalised forms then conflict:

1.2.3 -> 1.200.300 #  Should have stayed 1.2.3
1.003001 -> 1.3.1

And as "200" is larger than 3, euscan perceives that as a downgrade, not an upgrade.

3. Underscore version logic entirely wrong.

The numbers after the _ in perl do not signify a release candidacy number, the _'s are effectively ignored by the Perl VM, and the _ simply signifies to humans that the version is a development release.

The placement of the _ is typically not significant, except for niche cases in the Perl Module version.pm, who's implementation of _ is considered "wrong" by the Perl Toolchain community, due to the fact its semantics differ from those of Perl itself.

Hence, translating the part after "_" to be "RcX" is wrong, as is taking any meaning from "_" other than "its alpha".

As such, I have performed a test batch against the current mangle_version implementation, and here is its results, showing how it radically cannot normalise versions to specification, and gets the results wrong roughly 50% of the time:

not ok - normalise 1.000    works
# got: 1..0
# expected: 1.0.0
not ok - normalise 1.00     works
# got: 1..0
# expected: 1.0.0
not ok - normalise 1.0      works
# got: 1..0
# expected: 1.0.0
not ok - normalise 1.0      works
# got: 1..0
# expected: 1.0.0
not ok - normalise 1        works
# got: 1.100
# expected: 1.0.0
ok - normalise 1.002    works ( 1.2.0    )
ok - normalise 1.020    works ( 1.20.0   )
ok - normalise 1.200    works ( 1.200.0  )
ok - normalise 1.02     works ( 1.20.0   )
ok - normalise 1.2      works ( 1.200.0  )
not ok - normalise 1_2      works
# got: 1.100_rc2
# expected: 12.0.0_rc
not ok - normalise 1.2_3    works
# got: 1.200.0_rc3
# expected: 1.230.0_rc
not ok - normalise 1.23_4   works
# got: 1.230.0_rc4
# expected: 1.234.0_rc
not ok - normalise 1.2_34   works
# got: 1.200.0_rc34
# expected: 1.234.0_rc
not ok - normalise 1.234_5  works
# got: 1.234.0_rc5
# expected: 1.234.500_rc
not ok - normalise 1.234_56 works
# got: 1.234.0_rc56
# expected: 1.234.560_rc
not ok - normalise 1.234_567 works
# got: 1.234.0_rc567
# expected: 1.234.567_rc
not ok - normalising 1.234_567_891 should fatalize
# unexpected got: 1.234._56.7_8.910
not ok - normalising 1.x      should fatalize
# unexpected got: 1.x00.0
ok - normalise v1.2.3   works ( 1.2.3    )
not ok - normalise v1.2_3   works
# got: 1.2_3
# expected: 1.23.0_rc
not ok - normalise 1.2.3_4  works
# got: 1.200.300_rc4
# expected: 1.2.34_rc
ok - normalise v1234.4567.7890 works ( 1234.4567.7890 )

I have spent a bit of effort and have recoded the fundemental logic here to be in line with what Gentoo tends to expect from Perl versions: https://gist.github.com/kentfredric/4c43a83c9fc2be8988cf

However, it is only in a standalone form at this time because I don't know which design elements of it are acceptable for euscan, and its standalone form makes it simpler to test and develop without having to circumnavigate and understand the whole euscan infrastructure.

I also don't know how the right way to handle "unparseable" versions are, the edge cases I have minitests for which are invalid upstream ( one case is invalid anywhere in a perl version, but the other case is only invalid to version.pm, an this latter case is easy enough to make pass ).

So pending feedback, I will restructure this code to suit the project, but as-is, the functions:

def split_float(up_pv):
  ...
def normalize_chunk(up_pv_piece):
   ...
def parts_to_normal(version_parts):
  ...
def mangle_version_simple(up_pv):
  ...
def mangle_version(up_pv):
  ...

From my example gist should be, I expect, pluggable into cpan.py as-is.

volpino commented 8 years ago

Hi! sorry for the unacceptable delay in replying. I was quite busy in the last period and I forgot about this issue. I am ok in merging this, could you prepare a pull request?

@iksaif what do you think?

iksaif commented 8 years ago

looks good to me.

On Tue, Sep 8, 2015 at 2:13 PM, Federico Scrinzi notifications@github.com wrote:

Hi! sorry for the unacceptable delay in replying. I was quite busy in the last period and I forgot about this issue. I am ok in merging this, could you prepare a pull request?

@iksaif https://github.com/iksaif what do you think?

— Reply to this email directly or view it on GitHub https://github.com/iksaif/euscan/issues/25#issuecomment-138535577.

Corentin Chary http://xf.iksaif.net

kentfredric commented 8 years ago

I also attempted a stand-alone implementation of CPAN version handling as https://gist.github.com/kentfredric/cfd5a593d3d90d87ae11

And it handles both cpan -> gentoo version translation, as well as handling cpan <=> cpan version comparisons.

iksaif / euscan