Validate full model name in the deprecate model flow

tapadipti commented 1 year ago

Earlier, the model name was not being validated for the deprecate model action. It was getting validated for all other actions including the unassign and deregister actions (which are variations of the deprecate action).

Resolves https://github.com/iterative/dvc/issues/9821#issuecomment-1755122434

sweep-ai[bot] commented 1 year ago

Apply Sweep Rules to your PR?

[ ] Apply: Leftover TODOs in the code should be handled.
[ ] Apply: All new business logic should have corresponding unit tests in the tests/ directory.
[ ] Apply: Any clearly inefficient or repeated code should be optimized or refactored.

shcheklein commented 1 year ago

Thanks @tapadipti ! I left a comment + I think we would need a test for this.

codecov-commenter commented 1 year ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Files	Coverage Δ
gto/constants.py	`100.00% <ø> (ø)`
gto/registry.py	`90.61% <100.00%> (+0.03%)`	:arrow_up:

:loudspeaker: Thoughts on this report? Let us know!.

tapadipti commented 1 year ago

Thanks @shcheklein. I've addressed your comments.

For the test, I ran it locally by calling it from a __main__ block. But I'm not sure how to run it without that.

There are more validations to be added. Eg, currently it accepts // in the name, and this is not allowed in the Git ref. I'll work on fixing this as part of https://github.com/iterative/dvc/issues/9821 but it's not within the scope of this PR.

shcheklein commented 1 year ago

thanks @tapadipti ! it looks good, but it seems we need to fix the test. Please see the checks above.

tapadipti commented 1 year ago

it seems we need to fix the test. Please see the checks above.

The error message returned by the name validation functions is quite long. So it looks like a new line character (\n) gets added automatically - see the extra \n between Value and must below:

❌ Invalid value 'a4!'. Only letters, numbers, '_', '-', '/' are allowed. Value \nmust be of len >= 2 and must start and end with a letter or a number.\n

Is there a way to turn off this automatic addition?

I was not adding this \n to my test (coz I wasn't expecting it to get added when running), and locally I didn't see this issue.

To resolve this, I've now added the \n to the error message and to my test. It's passing now.

However, I'm not sure if some other products with dependencies on GTO could break due to this (if they're expecting a string without this \n). cc @iterative/studio

shcheklein commented 1 year ago

@tapadipti I think it's better to not alter the behavior, but rather fix test to be more lax on comparison. E.g. remove all \n before comparing the result.

tapadipti commented 1 year ago

@tapadipti I think it's better to not alter the behavior, but rather fix test to be more lax on comparison. E.g. remove all \n before comparing the result.

@shcheklein I've done this here. All tests passed, so it should be fine.

Edit: Just realized that the tests are failing. Going to re-fix it.

tapadipti commented 1 year ago

@shcheklein I’ve tried 2 ways to remove the \n. But it’s not working, coz it seems the \n is being escaped and so the character is \\n. This does not happen locally, so I don’t know how to resolve this. I even tried substituting \\n - as I expected, that does not help either (I've removed that commit). This is the error which needs to be cleaned of the line breaks:

"\u274c Invalid value 'a4!'. Only letters, numbers, '_', '-', '/' are allowed. Value\\nmust be of len >= 2 and must start and end with a letter or a number.\\n"

shcheklein commented 1 year ago

@tapadipti do you have sense on why is it escaped? what other symbols are escaped?

tapadipti commented 1 year ago

@tapadipti do you have sense on why is it escaped? what other symbols are escaped?

@shcheklein So this is failing for Windows only - so it must be something to do with Windows convention for line endings (not the first time Windows got me stuck for so long 😞). Still not sure how to resolve this - will try to figure out. But I think it's better to just add the line break at the end of the sentence (like I tried before).

~No, I couldn't figure this out - for some reason the \ in \n is being escaped, so it becomes \\n. When I try locally, a string with \\n prints just \n. And I don't know how to replicate how the complete test suite is being run - I've run individual tests and it works fine. The only character causing me issue is the line break which is getting added in the middle of a line. Without this extra line break, there's no issue coz all the other line breaks are at the same position in both the actual and expected values.~

pmrowla commented 1 year ago

ideally in the long term all of the tests should be updated to work off of a list of expected lines instead of a single hard coded expected string, and then the check functions would do something like

assert result.stdout.splitlines() == expected_stdout
assert result.stderr.splitlines() == expected_stderr

which will work regardless of platform/line ending, and it will account for commands where the newlines in output are significant

Should also note that this does not account for the fact that click will also automatically add indentation depending on content and context which may end up causing similar issues in the future (so exact string comparison in tests would break due to leading tabs or spaces instead of newlines)

tapadipti commented 1 year ago

@pmrowla Yes.

Wrapped lines will also be split using the platform specific line ending (which is CRLF on windows and not just LF)

I wish this comment was here yesterday or earlier today. Figured it out the hard way today. Have fixed it by doing replace differently for Windows. Which I don't like. So I'll revert to what you've suggested - which is pretty much what I had here and here. @shcheklein Can you pls confirm you're ok with this - coz you'd earlier suggested I think it's better to not alter the behavior, but rather fix test to be more lax on comparison. E.g. remove all \n before comparing the result.

pmrowla commented 1 year ago

I think it's better to not alter the behavior, but rather fix test to be more lax on comparison. E.g. remove all \n before comparing the result.

The problem with this is that this is applied to the tests for every GTO command and it strips the line breaks in places where the test behavior is actually significant, i.e. when the expected output is explicitly

To push the changes upstream, run:
    git push origin nn2@v0.0.1

Stripping/replacing all the line breaks means the test would now pass if the CLI command outputs

To push the changes upstream, run:     git push origin nn2@v0.0.1

(which is why I suggested that the long term solution should be that CLI output tests in GTO compare lists of expected lines and not a single expected string)

tapadipti commented 1 year ago

@pmrowla I agree with you - it's better not to strip the line breaks. I've made a new commit reverting to an earlier state where I added an explicit line break in the error message to handle the 80 character limit. The suggestion you provided in your comment works for the name a4! but if we change the name to a string of different length at any time, it won't work. So I added a line break after allowed., so that the line does not become >80 chars long in most cases and this issue does not arise.

@shcheklein pls confirm if this is ok.

pmrowla commented 1 year ago

The suggestion you provided in https://github.com/iterative/gto/pull/427#pullrequestreview-1684318123 works for the name a4! but if we change the name to a string of different length at any time, it won't work.

This is true, but we would hit the issue even with the manual line break if we changed the test to use a longer name.

I would prefer to not have the manual line breaks in the exception message. The line wrapping is handled by the UI rendering library (click for GTO). The real issue here is with the test suite itself - the tests should not be affected by line wrapping in the UI library)

IMO these kinds of fragile UI tests for CLI output are not actually useful at all. It would be better to test that the expected exception is raised by the underlying call.

Basically, GTO tests should only be testing GTO functionality. But right now, the CLI tests are causing issues because in addition to testing GTO functionality they are also testing how click line wraps a given exception message. The click formatting behavior is not actually relevant to whether or not GTO works properly (and click's line wrapping is tested in click itself).

tapadipti commented 1 year ago

@pmrowla I tried removing the manual line breaks in the exception message, as you suggested. But it's not working - it looks like the space between Value and must is getting stripped in Windows. See the assertion errors in the test reports for the latest commits.

Summary of error:

When I use [your suggested code](https://github.com/iterative/gto/pull/427#pullrequestreview-1684318123) for the test (in [this commit](https://github.com/iterative/gto/pull/427/commits/ab1b866ecefd99fbf3b949c9bfe73fdce8792c67)), I get the following mismatch between the actual and expected values. ` "\u274c Invalid value 'a4!'. Only letters, numbers, '_', '-', '/' are allowed. Value\\nmust be of len >= 2 and must start and end with a letter or a number.\\n" ` ` "\u274c Invalid value 'a4!'. Only letters, numbers, '_', '-', '/' are allowed. Value \\r\\nmust be of len >= 2 and must start and end with a letter or a number.\\n" ` If I use a `\n` in the test (see [this commit](https://github.com/iterative/gto/pull/427/commits/371e120defae679d90ab7f08b2d3bbdce4f6311f#diff-4e8715c7a425ee52e74b7df4d34efd32e8c92f3e60bd51bc2e1ad5943b82032eR357)), I get the following mismatch: ` "\u274c Invalid value 'a4!'. Only letters, numbers, '_', '-', '/' are allowed. Value\\nmust be of len >= 2 and must start and end with a letter or a number.\\n" ` ` "\u274c Invalid value 'a4!'. Only letters, numbers, '_', '-', '/' are allowed. Value \\nmust be of len >= 2 and must start and end with a letter or a number.\\n" `

From what I saw yesterday, this (stripping) does not happen in ubuntu and macos, so I can't just remove the space from the expected value (else the test will fail in ubuntu and macos). I can treat Windows differently (similar to how I did here), but imo that's a terrible idea. So my only option now is to revert to adding the manual line break in the exception message (or to improve how the tests are written, which I don't want to make part/dependency of this PR).

dberenbaum commented 1 year ago

~Hope this helps and isn't further derailing here, but can you strip only the \r clrf windows-style line endings from the actual output? I don't think we are ever going to intentionally add those into output.~

Edit: I see there are also issues related to spacing, so I don't think this alone will help much. Disregard.

pmrowla commented 1 year ago

We should just remove this CLI test entirely and test the underlying behavior in test_api instead like

with pytest.raises(ValidationError):
    api.deregister(...)  # call deregister with an invalid name

tapadipti commented 1 year ago

For now, I'm removing the test. I've created an issue to improve tests, and we can add a test for the deprecate model action with invalid model name there. cc @pmrowla @shcheklein @dberenbaum

tapadipti commented 1 year ago

@pmrowla Can you ptal at this comment and re-review this PR. I can't merge the PR without you accepting the change.

iterative / gto

Validate full model name in the deprecate model flow #427

Apply Sweep Rules to your PR?

Codecov Report