Code organizational problems bad for readability and maintainability.

deeglaze commented 5 years ago

As a heavy user of the linux-sgx library, especially from a different build environment, the Asylo team has had to work around some organizational idiosyncrasies in linux-sgx that cause issues with our build system, Bazel. Bazel has some more strict requirements on code and library organization than is enforceable at the language level, so a Makefile+gcc environment will not make some things we've found all that apparent.

I'd like to work with y'all to make large scale changes to organization, since I don't think it merely benefits the Asylo team. Each point I list below I hope gives enough merit to undertaking a reorganization. The list is ordered from most problematic (for Asylo's maintenance) to least.

Number 1 is a problem for all users that seek to use and trust this library. Number 2 is the most pressing issue for us by far, since it causes us the most effort when updating releases.

Commits to github.com/intel/linux-sgx should be auditable. Unreadable changes make trust difficult. Each release is thousands of lines long with unnecessarily noisy copyright year updates to hundreds of files. Massive commits require massive code reviews to see if the library is still trustworthy. Instead, changes should come out as they're made. It's more courteous to your users (just like small code changes are appreciated by code reviewers), and it's more in the spirit of collaborative open source procedure. Google has helpfully made an open-source tool to help with outputting code on commit-granularity from internal code, copybara. Asylo uses copybara to export its commits and transform code to fit the open source environment (we build things a little differently internally due to our monorepo). We batch up our different commits and push them out weekly.
Code-layering in linux-sgx does not follow a strict domain separation. Headers meant for the trusted domain #include headers meant for the untrusted domain. Common headers #include headers meant only for the trusted domain (sgx_trts.h, sgx_trts_exception.h, and more see trusted_inc_internal in Asylo's linux_sgx patch), or meant only for the untrusted domain (sgx_urts.h and more see untrusted_inc_internal in Asylo's linux_sgx patch). trusted code depending on untrusted code, or vice-versa.
Code should include what it uses. There are a great many files that depend on transitive #includes to provide various type definitions and function declarations. It's better practice to #include everything the file directly uses. You may allow transitive #include only from your paired header file, i.e., if X.cpp has #include "X.h", and X.h has #include , then X.cpp does not need to #include . This is only for the same X across cpp and h though. Beyond that, refactoring can cause headaches.
Local #include lines should be relative to the repository root, and use quoted includes, not system includes. There are way too many required -I flags to build this library. By including the paths, you get a better idea of which subsystems your code is depending on. I wrote a script to do this back for version 2.1.3 but could be convinced to update it for 2.6 if y'all are willing to accept its outputs. I'd need some guidance for regression-testing linux-sgx itself since we don't use all of it, and we don't use its Makefiles.
There should be one library that "owns" any particular header, for any build configuration. The simulation/hardware split and inside/outside enclave split should just be different libraries with different files. We find this to be a problem most heavily with DCAP, which appears to have a very strange circular dependency with linux-sgx. The PSW depends on quote_wrapper and pce_wrapper, and DCAP depends on linux-sgx's common include headers. Or it doesn't and there's just a bunch of duplicated code? It's hard to say here.
Headers should be parsable on their own. There are some headers that depend on latent C-preprocessor definitions to make sense. One that I hit recently is one that causes a crash if a seemingly unrelated define (ElfParser) changes sys_word_t in thread_data.h, which throws off field offset assumptions in hand-written assembly! The RTS_SYSTEM_WORDSIZE define is oddly defined in elf*parser.h, which is part of the psw/urts subtree. Since thread_data.h is common/inc/internal, this is both a code layering violation and a header parsability problem.
Code formatting is inconsistent and many files include noisy trailing whitespace. The clang-format tool is extremely good at maintaining semantic equivalence and enforcing style guides. We've discussed formatting before, but I think you need to re-evaluate how well your guidelines are being followed. Tool support is a big help, and clang-format probably can automate your guidelines.

This is a big list, but I think it can be done. 1 and 7 are Intel workflow-related, so the most out-of-my-hands. I can help with 4, and offer some guidance for 2 and 5. The 3 and 6 are just good C++ practices of their own right that would be good to follow.

andyzyb commented 5 years ago

Thanks for the suggestions. It is a long list and we probably don't have bandwidth to work on it right now. It would be better to work on a specific issue and improve the code gradually.

henrywang8atfbdotcom commented 2 years ago

Totally agree. This repo is ... challenging... to understand, maintain and upgrade on our end as well.

intel / linux-sgx

Code organizational problems bad for readability and maintainability. #433