Closed GreyCat closed 1 year ago
@GreyCat I didn't realize until now that this could be a problem, but I agree with you that it would be good to standardize this. It's true that the current naming pattern just uses the least effort to distinguish the individual subtargets of one target language based on what environments/implementations we currently test on. But there's a problem with this inconsistency when you've for example tested a particular target language only on Linux so far and now you want to test on other plaftorms, too - then you have to rename the existing branches (and Git doesn't even handle remote branch renaming very well, AFAIK you have to basically delete the existing branch and create a new one), and that's not very nice - of course it's better to have a consistent future-proof naming scheme from the start.
And obviously we may also want to test on other implementations of a target language, too - for example so far we only run Python on the CPython implementation, but we could also test on PyPy, etc.
Proposal: let's standardize everything consistently using the approach similar to the Docker images:
$TARGET$SUBTARGET/$IMPLEMENTATION-$OS-$ARCH
Looks good. I don't think we used dashes -
before, but I see the benefit of using a different separator between the $IMPLEMENTATION
/$OS
/$ARCH
parts because then you can keep the underscore _
reserved for when you want to separate multiple words within one part (which is kind of already being done for TARGET = cpp_stl
, but it's not hard to imagine e.g. construct/python3.11_cpython-linux-x86_64
, where IMPLEMENTATION = python3.11_cpython
).
All in all, feel free to go ahead with it :)
From my point of view I don't see any obvious unconsistences. For example, what the difference would be between artifacts of Java, Python or JavaScript targets for different OSes? All those languages already OS-independent so it seems just confusing to have different artifacts for them even if they are actually would be the same
@Mingun:
All those languages already OS-independent
If only this were true...
so it seems just confusing to have different artifacts for them even if they are actually would be the same
Confusing to whom? At https://ci.kaitai.io/ you normally don't even notice how many platforms/versions have run, unless there are differences in test results of individual subtargets (status "mixed" or "failed" with different stack traces).
If only this were true...
And this is not (in context of Kaitai Struct, where the difference only could be in a file reading code)?
Confusing to whom?
To the maintainers who would try to understand what the difference of (figuratively) java-openjdk7-linux
vs. java-openjdk7-windows
. Remembering that during build there is only assemble jar, without difference, for windows or linux...
@GreyCat I didn't realize until now that this could be a problem, but I agree with you that it would be good to standardize this. It's true that the current naming pattern just uses the least effort to distinguish the individual subtargets of one target language based on what environments/implementations we currently test on. But there's a problem with this inconsistency when you've for example tested a particular target language only on Linux so far and now you want to test on other plaftorms, too - then you have to rename the existing branches (and Git doesn't even handle remote branch renaming very well, AFAIK you have to basically delete the existing branch and create a new one), and that's not very nice - of course it's better to have a consistent future-proof naming scheme from the start.
Well, my point is very simple: right now our publication to ci_artifacts from Docker images uses:
"${TARGET}${SUBTARGET}"/"$IMPL"
(see main.yml#L194), as this is very consistent with running using these three:
./docker-ci -t "$TARGET" -u "$SUBTARGET" -i "$IMPL"
(see main.yml#L179). This results in branches like cpp_stl_98/gcc11
, which is very bare and ignores OS and arch, which is quite important.
Another point is that I want to add some automation that will pull current CI master results and compare fresh local results vs reference master, and, again, I feel bad about pulling it without reference to OS and arch.
Looks good. I don't think we used dashes
-
before, but I see the benefit of using a different separator between the$IMPLEMENTATION
/$OS
/$ARCH
parts because then you can keep the underscore_
reserved for when you want to separate multiple words within one part (which is kind of already being done forTARGET = cpp_stl
, but it's not hard to imagine e.g.construct/python3.11_cpython-linux-x86_64
, whereIMPLEMENTATION = python3.11_cpython
).
Good point. I also want to keep it consistent with overall Docker image naming standard as much as possible.
Thanks for the feedback!
If only this were true...
And this is not (in context of Kaitai Struct, where the difference only could be in a file reading code)?
I don't have anything immediate to point a finger to (and that's the reason why we normally just run most of these OS-independent languages on Linux Docker only and that's it). However:
@GreyCat:
2. Even relatively OS-independent languages have native-specific features in them that sometimes we might want to test. E.g. everything that maps structures directly in memory layout, or interfacing with native APIs.
Yes. In Kaitai Struct, we need to work with files and sometimes memory-mapped files. And files do have slight OS-specific differences in behavior. For example, file paths are case-sensitive on Unix and case-insensitive on Windows. On Unix, there isn't a difference between text and binary files (so even if you open a file in the default text mode, you can treat it as binary without problems), but if you open a file in text mode on Windows, some line ending conversions will happen (which is something you don't want for binary files), so you have to make sure to open files in binary mode if you intend to support Windows. On Windows, you cannot remove a file while it's open (on Linux you can, although AFAIR it'll be physically removed only after it's been closed). And in Python, pipes/sockets behave as non-seekable on Unix (as they should), but "seekable" on Windows (see https://github.com/python/cpython/issues/78961, https://github.com/python/cpython/issues/86768). And so on...
The point is that even these relatively OS-independent languages don't (and sometimes even can't) fully abstract you from the underlying differences of OS behavior.
3. Even relatively OS-independent languages, in the end of the day, will most likely call a libc/WinAPI function, which will implement things in a way unique to the platform. External stuff (like encodings, compression, encryption) will likely have some degree of difference on how this works on different platforms.
Exactly. On Windows it seems to be quite common to use the OS-specific means for character encoding conversions (i.e. with the infamous codepages) even in these relatively OS-independent languages, but this may mean a different set of encodings supported, slightly different behavior, etc.
When it comes to testing, I'm generally a fan of not making too many assumptions that something will work somewhere even if you don't test it (because this has been disproved many times) - I prefer to test as much as possible. Ideally, we should test each environment (i.e. language version, implementation, platform, architecture) we intend to support. It's not always feasible, but we can get close.
Of course, it's even more important to test languages that are specifically not OS-independent like C++ on different platforms (we have literally OS-specific code in our C++ runtime library, for example) than e.g. Java, but that doesn't mean I don't see any point in testing Java (and other languages) on different platforms too.
Went ahead and implemented the change:
With almost full transition from Travis to GHA, this actually covers 100% of our targets/subtargets, they should be all in the same format.
The only one left is mono, which I would just remove for now.
Soliciting opinions on revisiting branch naming convention when we store artifacts in ci_artifacts. Right now, it's fairly inconsistent:
Some common patterns are seen:
msvc141_windows_x64
but also havegcc4.8_linux
orcpp_stl_11/clang7.3_osx
andcsharp/mono5.18.0
vscsharp/netcore2.2.103_linux
.Proposal: let's standardize everything consistently using the approach similar to the Docker images:
If necessary, we can even keep track of generated artifacts history by tagging these branches prior to commit.
Cc @generalmimon?