Open 0ctopus13prime opened 2 weeks ago
I want to put it on a vote:
I typically refer to google's style guide. But I dont have a strong preference. Do you have one you like in particularly?
Since it requires clang-format, we need to overhaul all the pipeline, CI infra had it already to avoid program not found exception during building.
It seems spotless supports it: https://github.com/diffplug/spotless. Would we be able to leverage gradle to do this so we dont have to worry about dependencies?
Since it is expected to bring changes most files, all changes before formatting in dev branch will likely hit merge conflicts.
This is unfortunate but a one time cost. I think its okay.
I typically refer to google's style guide. But I dont have a strong preference. Do you have one you like in particularly?
I really don't care haha. But, the google style is the most comfortable for me.
It seems spotless supports it: https://github.com/diffplug/spotless. Would we be able to leverage gradle to do this so we dont have to worry about dependencies?
Nice!! then we can decouple formatting from CMake entirely ONLY IF it can format as much as clang-format. I think it would be easier to modify gradle file than doing it in CMake. Hoping that spotless can support as much as clang-format does. I will attach the comparisons between two.
Looks like spotless
internally calls clang-format
? - https://github.com/diffplug/spotless/blob/main/plugin-gradle/README.md#cc
spotless {
cpp {
target 'src/native/**' // you have to set the target manually
clangFormat() // has its own section below
eclipseCdt() // has its own section below
licenseHeader '/* (C) $YEAR */' // or licenseHeaderFile
}
}
I think we can also consider eclipse CDT formatting with Google code style : https://github.com/google/styleguide/blob/gh-pages/eclipse-cpp-google-style.xml We can import the above XML into IDE (Clion or VSCode), and the good thing is that the formatter downloading will be taken care of by gradle, we don't actually need to care much about the library dependency. Also we don't need to modify CMake which is bit trickier to edit compared to gradle.
In build.gradle file
spotless {
cpp {
target('jni/include/**', 'jni/src/**')
eclipseCdt('11.6').configFile('/tmp/eclipse-cpp-google-style.xml')
}
}
Result:
-jlong knn_jni::commons::storeVectorData(knn_jni::JNIUtilInterface *jniUtil, JNIEnv *env, jlong memoryAddressJ,
- jobjectArray dataJ, jlong initialCapacityJ, jboolean appendJ) {
- std::vector<float> *vect;
- if ((long) memoryAddressJ == 0) {
+jlong knn_jni::commons::storeVectorData(knn_jni::JNIUtilInterface *jniUtil,
+ JNIEnv *env, jlong memoryAddressJ,
+ jobjectArray dataJ,
+ jlong initialCapacityJ,
+ jboolean appendJ) {
+ std::vector<float> *vect;
+ if ((long) memoryAddressJ == 0) {
@0ctopus13prime did you get a poc working? If so Im good with google guide and eclipse
Sure, let me raise a PR shortly. I'm planning to start working on this after bumping up JNI C++ version to 17.
@jmazanec15 Hey Jack, everything seems to be working fine, just doing a last check.
FAISS uses snake_case for method names (NMSLIB uses camelCase, but that's going to be deprecated in the future anyway). The Google C++ style guide uses PascalCase for method names (e.g., Analyze(int foo)
), but do you think we should align with FAISS and switch to snake_case (e.g., range_search(idx_t n, ...)
)? We could still follow other parts of the Google C++ style, just changing the method name style to snake_case.
// Google C++ code style
class MyClass {
public:
void Analyze(const std::string &text);
void Analyze(const char *text, size_t textlen);
};
// Faiss
struct IndexHNSW : Index {
typedef HNSW::storage_idx_t storage_idx_t;
void range_search(
idx_t n,
const float* x,
float radius,
RangeSearchResult* result,
const SearchParameters* params = nullptr) const override;
In general, I think PascalCase is fine. I dont think we need to align with any libraries. We just need to make sure our conventions are consistent.
@jmazanec15 Sounds good, once C++ 17 update PR is merged will raise a new PR for this shortly
Enabled C++17 in JNI, will continue working on this.
Eclipse CDT mostly works but it's not perfect. Seems struggling when parsing template <
. It keeps recognizing it as a comparison sign.
std::unique < XXX
> xxx = ...
Expected
std::unique<XXX> xxx = ...
I'm inclining toward clang-format + clang-tidy options
which rely on clang lexer, providing more accurate linting and static analysis.
Will evaluate how easy to automate installing them in either build.gradle or CMAKE. They are very standard in C++ world, it should be straightforward to install them in most popular platforms - Mac, Linux and Windows
After running clang-tidy with Google code style on our JNI codes, I got around 330 warnings (excluding faiss and nmslib). I’ll clean these up in a PR soon. Also, it looks pretty manageable to make a cross-platform install script. I’ll test it on the major platforms first then include it in the PR once it’s ready.
sounds good! So, would gradle not auto-configure dependency?
@jmazanec15
Not for clang-format
. It just calls it to format the code, but I will not download it unfortunately. :stuck_out_tongue:
But it does for eclipse CDT. Another bad luck is that it can't really do the formatting the template code...
Will share a comprehensive PR having an install script for clang-format + tidy as well as the result code after applied formatting shortly!!
Description
Currently, we don’t have a formatting tool enabled for C++ code, and as a result, the codebase has a mix of different coding styles. In contrast,
spotless
is already enabled for Java, preventing builds if code does not follow the established format. I propose adopting a similar approach for our C++ (JNI) code.Establishing a consistent code style for C++ in our repository would enhance readability and facilitate future collaboration. Allowing multiple styles to coexist can lead to the "Broken Windows Effect," where visible inconsistencies invite further deviations, ultimately contributing to a less organized and harder-to-maintain codebase.
Tools
After researching several options, I propose setting up
clang-format
, which is compatible across major platforms—Linux, macOS, and Windows. The tool is also easy to integrate with popular IDEs like CLion, VSCode, and others, makingclang-format
a straightforward choice for our needs.Note : We can also consider to use spotless - https://github.com/diffplug/spotless
Code Style
The key principle is to adopt a unified and widely accepted code style that maintainers prefer. Naturally, auto-generated JNI method signatures will remain unchanged, even after introducing the formatter, as the method naming pattern is a contract that the JVM relies on to locate the methods it calls.
I want to put it on a vote:
Required Changes
General idea is to make Makefile generated by CMake to call
clang-format
to format C++ codes. I prefer this approach since I like a system to take care of choirs by itself, but open to other approaches.Or, alternatively we can make it reject to build if it found there are misformatted files similar to what we are doing in Java. Once it's rejected, user must run formatting manually before building.
Expected Hassle
Since it requires
clang-format
, we need to overhaul all the pipeline, CI infra had it already to avoid program not found exception during building.So I'm proposing 2 stages of introduction:
1st phase : Introduce formatting, but run it only if the program exists.
2nd phase : After make sure that pipelines, CI and other major third party won't have issue with formatting, we can enforce the formatting.
Merge conflicts
Since it is expected to bring changes most files, all changes before formatting in dev branch will likely hit merge conflicts.