sec / dotnet-core-freebsd-source-build

Collection of script to build .NET Core under FreeBSD OS (with binary releases)
MIT License
53 stars 4 forks source link

Sluggish performance using native binaries (net7RC1) #15

Closed Thefrank closed 1 year ago

Thefrank commented 1 year ago

I would like to blame iXSystems but posting here anyways:

[root@azurecli ~]# freebsd-version;uname -a
13.1-RELEASE-p2
FreeBSD azurecli 13.1-RELEASE-p2 FreeBSD 13.1-RELEASE-p2 n245412-484f039b1d0 TRUENAS amd64

I do nightly build+tests of dotnet/runtime and since RC1 have noticed very slow performance on FreeBSD 13.1. The native binaries you provide are usually slightly faster than the crossbuilt ones so I have been using them but that no longer seems to be the case. An example when trying to build from https://github.com/dotnet/runtime/commit/911cc41358aa3a99c24fa85dca426fafa3bef555

Crossbuild: 51m 3s

Native: Killed by timeout at 240m 6s

binlog files are too large to upload on their own and github won't allow that extension anyways: https://we.tl/t-MOtTNqhIwO

nothing comes across as more "expensive" in terms of cpu usage between the two but a quick look at $task csc $time ($task $time will take a long time to parse) shows that some are taking several minutes to nearly an hour

sec commented 1 year ago

Hi @Thefrank, I did a quick native build of runtime at the commit you provided and

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:31:34.08

Runtime built

./dotnet --info

Host:
  Version:      8.0.0-ci
  Architecture: x64
  Commit:       911cc41358

Command I used ./build.sh --warnAsError false -ci -c Release /p:PublishReadyToRun=false

Patches used

index 33b098b1293..159864bce8b 100644
--- a/NuGet.config
+++ b/NuGet.config
@@ -23,6 +23,7 @@
     <add key="dotnet8-transport" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet8-transport/nuget/v3/index.json" />
     <!-- Used for the Rich Navigation indexing task -->
     <add key="richnav" value="https://pkgs.dev.azure.com/azure-public/vside/_packaging/vs-buildservices/nuget/v3/index.json" />
+    <add key="local" value="../nuget" />
   </packageSources>
   <disabledPackageSources>
     <clear />
diff --git a/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Crossgen2.sfxproj b/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Crossgen2.sfxproj
index 2e0ef08ec34..4c843dd9084 100644
--- a/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Crossgen2.sfxproj
+++ b/src/installer/pkg/sfx/Microsoft.NETCore.App/Microsoft.NETCore.App.Crossgen2.sfxproj
@@ -4,7 +4,7 @@

   <PropertyGroup>
     <!-- Crossgen is not used for Mono, and does not currently create freebsd packages -->
-    <SkipBuild Condition="'$(RuntimeFlavor)' == 'Mono' or '$(RuntimeIdentifier)' == 'freebsd-x64'">true</SkipBuild>
+    <SkipBuild Condition="'$(RuntimeFlavor)' == 'Mono' or '$(RuntimeIdentifier)' == 'Xfreebsd-x64'">true</SkipBuild>
     <PlatformPackageType>ToolPack</PlatformPackageType>
     <SharedFrameworkName>$(SharedFrameworkName).Crossgen2</SharedFrameworkName>
     <PgoSuffix Condition="'$(PgoInstrument)' != ''">.PGO</PgoSuffix>
diff --git a/src/mono/cmake/configure.cmake b/src/mono/cmake/configure.cmake
index ae55fd112b3..90ac8a543e2 100644
--- a/src/mono/cmake/configure.cmake
+++ b/src/mono/cmake/configure.cmake
@@ -160,7 +160,7 @@ check_c_source_compiles(
   #include <sched.h>
   int main(void)
   {
-    CPU_COUNT((void *) 0);
+    CPU_COUNT((cpuset_t *) 0);
     return 0;
   }
   "

Host was azure VM (4 cpu, 16GB) 13.1-RELEASE-p2 FreeBSD 13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64

Do I miss something here?

Thefrank commented 1 year ago

Do I miss something here?

no. I did. The command used hah!

just simple building: runtime/build.sh /p:OfficialBuildId=$(date +%Y%m%d)-99 -ci -c Release -subset Clr+Mono+Host+Libs+Packs -bl:$(Build.SourcesDirectory)/runtime/artifacts/log/BuildStep.binlog

build+test (aka my nightly): runtime/build.sh /p:OfficialBuildId=$(date +%Y%m%d)-99 -ci -c Release -subset Clr+Mono+Host+Libs+Libs.Tests+Packs --test -bl:$(Build.SourcesDirectory)/runtime/artifacts/log/CombinedStep.binlog

so either a) slight kernel difference between TrueNAS and stock FreeBSD or b) something in ReadyToRun

sec commented 1 year ago

Anything more I could possibly help/check here? ReadyToRun failed with unsupported FreeBSD as target, so I assume I would need to apply your patches (those from https://github.com/Thefrank/runtime/tree/fbsdaot) to make the build go (can't remember if objwriter from llvm was also needed as I was building this some time ago and I think I removed all the dirs with binaries)

Thefrank commented 1 year ago

Crossgen2SupportedRids needs to have freebsd-x64 from https://github.com/dotnet/installer/blob/f70747656494b3f4915e0a52dfa2b025a46136ec/src/redist/targets/GenerateBundledVersions.targets#L260 that allows it to run but won't try NativeAOT without further patching (NativeAOT handles its own supported RIDs and so does ILCompiler)

If it still complains you can try: https://github.com/Thefrank/dotnet-freebsd-crossbuild/blob/main/patches/patch_runtimetests.patch which adds in explicit FreeBSD OS support where it was missing. This should only impact tests but, if it hits more things (like R2R, NativeAOT, ILC) then it needs to be upstreamed

sec commented 1 year ago

One small change needed to src/coreclr/tools/Common/CommandLineHelpers.cs was enough to make the build go with ReadyToRun - it took extra 8min on already built runtime to pass without errors. Clean build is running, but I assume it will take some mins more than the initial one.

I did I quick look at TrueNAS kernel vs GENERIC from FreeBSD - they mainly removed some dirvers and added some extra modules. If you ask me, those changes should not affect runtime times.

I think I can get TrueNAS CORE 13.0-U3.1 to install on VM and try the build in there - or I'm not checking what I should?

edit: clean build

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:36:06.25
Thefrank commented 1 year ago

Gave it another shot. Still taking forever with crossgen2 tasks.

More details, yes system is ancient and yes, TrueNAS is running on baremetal instead of under a hypervisor.

MB: Dell r720xd
CPU: 2x E5-2660 Rev0 (so 2CPU, 8core, 16thread {aka 32t})
RAM: 384GB ECC
HDD: Storage is SATA, Builds are SAS. All mechanical. 

Services

[root@azurecli ~]# service -ev
/etc/rc.d/cleanvar
/etc/rc.d/ip6addrctl
/etc/rc.d/netif
/etc/rc.d/vsts_agent_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   <-azure pipelines "agent"
/etc/rc.d/virecover
/etc/rc.d/motd
/etc/rc.d/newsyslog
/etc/rc.d/os-release
/etc/rc.d/syslogd
/etc/rc.d/cron

PKG

[root@azurecli ~]# pkg info | awk '{print $1}'
alsa-lib-1.2.2_1
autoconf-2.71
autoconf-switch-20220527
automake-1.16.5
bash-5.1.16
binutils-2.37_2,1
brotli-1.0.9,1
c-ares-1.18.1
ca_root_nss-3.78
cmake-3.23.2
curl-7.84.0
cvsps-2.1_2
dejavu-2.37_1
encodings-1.0.5,1
expat-2.4.8
font-bh-ttf-1.0.3_4
font-misc-ethiopic-1.0.4
font-misc-meltho-1.0.3_4
fontconfig-2.13.94_2,1
freetype2-2.12.1
gdbm-1.23
gettext-runtime-0.21
giflib-5.2.1
git-2.37.1
gmake-4.3_2
icu-71.1,1
indexinfo-0.3.1
javavmwrapper-2.7.9
jbigkit-2.1_1
jpeg-turbo-2.1.3
jsoncpp-1.9.5
krb5-1.20
lcms2-2.13.1
libICE-1.0.10,1
libSM-1.2.3,1
libX11-1.7.2,1
libXScrnSaver-1.2.3_2
libXau-1.0.9
libXdmcp-1.1.3
libXext-1.3.4,1
libXfixes-6.0.0
libXft-2.3.4
libXi-1.8,1
libXrandr-1.5.2
libXrender-0.9.10_2
libXt-1.2.1,1
libXtst-1.2.3_2
libarchive-3.6.1,1
libedit-3.1.20210910,1
libffi-3.4.2
libfontenc-1.1.4
libidn2-2.3.2
libinotify-20211018
liblz4-1.9.3,1
libnghttp2-1.48.0
libpsl-0.21.1_4
libpthread-stubs-0.4
libssh2-1.10.0,3
libtool-2.4.7
libunistring-1.0
libunwind-20211201_1
liburcu-0.13.0
libuv-1.42.0
libxcb-1.15
libxml2-2.9.13_2
llvm-9,1
llvm10-10.0.1_10
llvm90-9.0.1_6
lua52-5.2.4
m4-1.4.19,1
meson-0.62.2
mkfontscale-1.2.1
mpdecimal-2.5.1
ninja-1.10.2,2
node16-16.16.0
npm-8.13.0
openjdk11-11.0.15+10.1
openssl-1.1.1q,1
p5-Authen-SASL-2.16_1
p5-CGI-4.54
p5-Clone-0.45
p5-Digest-HMAC-1.04
p5-Encode-Locale-1.05
p5-Error-0.17029
p5-GSSAPI-0.28_2
p5-HTML-Parser-3.78
p5-HTML-Tagset-3.20_1
p5-HTTP-Date-6.05
p5-HTTP-Message-6.37
p5-IO-HTML-1.004
p5-IO-Socket-INET6-2.72_1
p5-IO-Socket-SSL-2.074
p5-LWP-MediaTypes-6.04
p5-Mozilla-CA-20211001
p5-Net-SSLeay-1.92
p5-Socket6-0.29
p5-TimeDate-2.33,1
p5-URI-5.10
pcre2-10.40
perl5-5.32.1_1
pkg-1.18.3
pkgconf-1.8.0,1
png-1.6.37_1
py37-gdbm-3.7.13_6
py37-setuptools-62.1.0_1
py37-sqlite3-3.7.13_7
py37-tkinter-3.7.13_6
py39-setuptools-62.1.0_1
python-3.9_3,2
python3-3_3
python37-3.7.13_2
python38-3.8.13_2
python39-3.9.13
readline-8.1.2
rhash-1.4.3
rust-1.61.0
sqlite3-3.38.5,1
sudo-1.9.11p3
tcl86-8.6.12
terminfo-db-20210816
tiff-4.3.0
tk86-8.6.12
xorg-fonts-truetype-7.7_1
xorgproto-2022.1
yarn-1.22.18
zstd-1.5.2
Thefrank commented 1 year ago

I just noticed that you use the linuxulator (linux service) and linux_base-c7 package. I never used those as a part of my toolchain but, I am going to give those a try and see if it makes a difference.

sec commented 1 year ago

That was used before to make grpc binaries work, as there's no FreeBSD package inside nuget package, so it was trying to run Linux binaries. Later I found out that you can override those with env variables and use local ones (sometime I just replaced Linux binaries with FreeBSD one inside nuget dirs), ex.

setenv PROTOBUF_PROTOC /usr/local/bin/protoc
setenv PROTOBUF_TOOLS_CPU x64
setenv PROTOBUF_TOOLS_OS linux
setenv GRPC_PROTOC_PLUGIN /usr/local/bin

That's only used for asp.net build

Also question regarding your build env - as you can't use pkg under "root" TrueNAS, I assume you're building under jail ?

Thefrank commented 1 year ago

Yes it's built under a jail. Previous native builds were built under an earlier iteration (FreeBSD 12.1/12.2) of the same jail. Setting are from iocage get all JAIL

CONFIG_VERSION:28
allow_chflags:0
allow_mlock:1
allow_mount:0
allow_mount_devfs:0
allow_mount_fusefs:0
allow_mount_nullfs:0
allow_mount_procfs:0
allow_mount_tmpfs:0
allow_mount_zfs:0
allow_quotas:0
allow_raw_sockets:1
allow_set_hostname:1
allow_socket_af:0
allow_sysvipc:0
allow_tun:0
allow_vmm:0
assign_localhost:0
available:readonly
basejail:0
boot:1
bpf:1
children_max:0
cloned_release:12.2-RELEASE
comment:none
compression:lz4
compressratio:readonly
coredumpsize:off
count:1
cpuset:off
cputime:off
datasize:off
dedup:off
defaultrouter:xxxxxxxxx
defaultrouter6:none
depends:none
devfs_ruleset:4
dhcp:1
enforce_statfs:2
exec_clean:1
exec_created:/usr/bin/true
exec_fib:0
exec_jail_user:root
exec_poststart:/usr/bin/true
exec_poststop:/usr/bin/true
exec_prestart:/usr/bin/true
exec_prestop:/usr/bin/true
exec_start:/bin/sh /etc/rc
exec_stop:/bin/sh /etc/rc.shutdown
exec_system_jail_user:0
exec_system_user:root
exec_timeout:60
host_domainname:none
host_hostname:xxxxxx
host_hostuuid:xxxxxxx
host_time:1
hostid_strict_check:0
interfaces:vnet0:bridge0
ip4:inherit
ip4_addr:none
ip4_saddrsel:1
ip6:disable
ip6_addr:none
ip6_saddrsel:1
ip_hostname:0
jail_zfs:0
jail_zfs_dataset:iocage/jails/xxxxxxxxxxx
jail_zfs_mountpoint:none
last_started:2022-11-23 03:43:50
localhost_ip:none
login_flags:-f root
mac_prefix:02ff60
maxproc:off
memorylocked:off
memoryuse:off
min_dyn_devfs_ruleset:1000
mount_devfs:1
mount_fdescfs:1
mount_linprocfs:0
mount_procfs:0
mountpoint:readonly
msgqqueued:off
msgqsize:off
nat:0
nat_backend:ipfw
nat_forwards:none
nat_interface:none
nat_prefix:172.16
nmsgq:off
notes:none
nsem:off
nsemop:off
nshm:off
nthr:off
openfiles:off
origin:readonly
owner:root
pcpu:off
plugin_name:none
plugin_repository:none
priority:99
pseudoterminals:off
quota:none
readbps:off
readiops:off
release:13.1-RELEASE-p2
reservation:none
resolver:/etc/resolv.conf
rlimits:off
rtsold:0
securelevel:2
shmsize:off
stacksize:off
state:up
stop_timeout:30
swapuse:off
sync_state:none
sync_target:none
sync_tgt_zpool:none
sysvmsg:new
sysvsem:new
sysvshm:new
template:0
type:jail
used:readonly
vmemoryuse:off
vnet:1
vnet0_mac:xxxxxxxxxxxxxxxx
vnet0_mtu:auto
vnet1_mac:none
vnet1_mtu:auto
vnet2_mac:none
vnet2_mtu:auto
vnet3_mac:none
vnet3_mtu:auto
vnet_default_interface:auto
vnet_default_mtu:1500
vnet_interfaces:none
wallclock:off
writebps:off
writeiops:off
Thefrank commented 1 year ago

Went unicorn hunting thinking builds might not be as portable as first thought so I built RC1 natively under my jail system. I made sure to follow your methods and patching to try and reduce unknowns.

Result: Same as your native build; sluggish.

Back to hunting I suppose

EDIT:

sec commented 1 year ago

Hey, Sorry for no response, but work/home - anyway - I tried to setup TrueNAS under HyperV and it's having some issues with network speed (I think it's more related to HyperV and/or my local PC/network card rather that TrueNAS itself, so I wasn't able to setup fail as network speed was bytes per second inside) - I will try to deploy TrueNAS under some cloud provider and setup jail inside and try builds in there to check the speeds) - will update when "done" :)

LTRData commented 1 year ago

Hey, Sorry for no response, but work/home - anyway - I tried to setup TrueNAS under HyperV and it's having some issues with network speed (I think it's more related to HyperV and/or my local PC/network card rather that TrueNAS itself, so I wasn't able to setup fail as network speed was bytes per second inside) - I will try to deploy TrueNAS under some cloud provider and setup jail inside and try builds in there to check the speeds) - will update when "done" :)

Could it be the usual thing with accidentally using "legacy network adapter" for the Hyper-V VM? It gives at most 10 Mbit/s. If you switch to a native network adapter it should work just fine, in my experience.

sec commented 1 year ago

Hey, Sorry for no response, but work/home - anyway - I tried to setup TrueNAS under HyperV and it's having some issues with network speed (I think it's more related to HyperV and/or my local PC/network card rather that TrueNAS itself, so I wasn't able to setup fail as network speed was bytes per second inside) - I will try to deploy TrueNAS under some cloud provider and setup jail inside and try builds in there to check the speeds) - will update when "done" :)

Could it be the usual thing with accidentally using "legacy network adapter" for the Hyper-V VM? It gives at most 10 Mbit/s. If you switch to a native network adapter it should work just fine, in my experience.

Problem is, that directly under TrueNAS it's fine - but jail inside have those kind of problems.

edit: and it's not even 10Mbps but something like 100bps :>

LTRData commented 1 year ago

Hey, Sorry for no response, but work/home - anyway - I tried to setup TrueNAS under HyperV and it's having some issues with network speed (I think it's more related to HyperV and/or my local PC/network card rather that TrueNAS itself, so I wasn't able to setup fail as network speed was bytes per second inside) - I will try to deploy TrueNAS under some cloud provider and setup jail inside and try builds in there to check the speeds) - will update when "done" :)

Could it be the usual thing with accidentally using "legacy network adapter" for the Hyper-V VM? It gives at most 10 Mbit/s. If you switch to a native network adapter it should work just fine, in my experience.

Problem is, that directly under TrueNAS it's fine - but jail inside have those kind of problems.

edit: and it's not even 10Mbps but something like 100bps :>

Oh, that sounds strange. I have used FreeBSD and FreeBSD-based things like pfSense and TrueNAS a lot in Hyper-V but never seen anything like that. Except then for legacy network adapters. They get all sorts of weird effects with throughput usually at 10 Mbit/s but suddenly dropping to almost nothing and sometimes with huge CPU usage as well. But I have never seen that when configuring VM with the native network adapter instead.

sec commented 1 year ago

@Thefrank Managed to get TrueNAS under cloud and tried the build under jail. Looks like you're right, the build is going slow. One thing I've noticed is that when dotnet part is working, only one cpu core is used, during llvm part, all 20 were used. This is strange, as the same build/procedure under "clean" FreeBSD 13.1 is going fast.

Thefrank commented 1 year ago

@sec I observed basically the same thing: clang, meson, cargo, gcc, etc will use all available cores if instructed to do so even in a jail; dotnet's managed parts will not (e.g., dotnet exec or msbuild) with a native build but will with a crossbuild.

There are still some variables:

Thefrank commented 1 year ago

So... Using dotnet/arcade's build-rootfs.sh to make a FreeBSD 13.1 AMD64 rootfs and building net7GA (v7.0.100SDK) and using THAT to build from dotnet/runtime's head. I get...mixed results: build.sh /p:OfficialBuildId=$(date +%Y%m%d)-99 -ci -c Release -subset Clr+Mono+Host+Libs+Libs.Tests+Packs --test.

    0 Warning(s)
    10 Error(s)

Time Elapsed 01:39:29.20

(Errors are from test failures)

Compared to last night's build using FreeBSD 12.3 AMD64 crossbuild under the same jail:

    0 Warning(s)
    9 Error(s)

Time Elapsed 00:56:25.80

(Errors are from test failures)

The crossbuild for 13.1 is faster than the native 13.1 but still markedly slower than the crossbuild for 12.3. 13.1 crossbuild exhibits the same singlethread issues that the native ones does.

sec commented 1 year ago

@Thefrank I did more builds on stock 13.1, without jail. I saw that runtime build is taking use of 1 core durint .net phase, which is weird, but the build is taking about ~30min to complete fully (without tests). BUT build of aspnetcore, using the same base SDK, is using all cpu power (I saw 8-10 dotnet processes spawned during the build). Strange... Tried to pass maxcpucount with some magic numbers for runtime build, but the effect is still the same - I wouldn't be surprised if this was the issue for the longer time and we just didn't noticed that :D

Thefrank commented 1 year ago

@sec I have seen the same thing you witness: only a single process but natively built taking very long (and only using at most 100% WCPU seen from top on host) vs crossbuild being very short (and using up to 1800% WCPU seen from top on host) Screenshot 2022-12-13 171310

also: have you been able to test inside a jail yet? If not, I should have some time this weekend to get a FreeBSD 13.1 VM setup locally under Hyper-V to do some testing.


3days later edit:

Hyper-V'd FreeBSD 13.1-p3 on my Windows system (assigned: 4vcpu, 6 GB of RAM) Jail was Bastille thin jail running FreeBSD 13.1 because...I didnt't setup the VM for ZFS and CBSD looked like it would take more time setup.

git clone --depth 1 https://github.com/dotnet/runtime add patches for CPU_SET + crossgen output unpack native SDK add nuget path for native built items runtime/build.sh /p:OfficialBuildId=20221216.99 -ci -c Release -subset Clr+Mono+Host+Libs+Packs

native: Time Elapsed 00:55:59.19

git -C runtime clean -ffdx unpack crossbuilt SDK change path for crossbuilt outputs runtime/build.sh /p:OfficialBuildId=20221216.99 -ci -c Release -subset Clr+Mono+Host+Libs+Packs

cross: Time Elapsed 00:48:22.43

8min of variance it still a bit high so I will try native vs cross a few more times to see if anything comes out. If the times are more regularly closer (5min or less between the two), I am willing to settle with "Close ticket, TrueNAS and/or iocage issue"


Over night builds: Both likely faster because nothing else was being used on my desktop system. Still a wide difference between native and crossbuilt. This might be a JAIL issue than a specific platform issue.

native: Time Elapsed 00:52:43.40 cross: Time Elapsed 00:43:59.07


From host:

native: Time Elapsed 00:52:52.51

cross: Time Elapsed 00:45:40.57

So Bastille jails do not exhibit the same problem I was seeing. Native is the same amount of slower than crossbuild regardless if jailed.

sec commented 1 year ago

Hm. iocage or Bastille, those are only jail manager, so they shouldn't not limit/affect anything. If I will find some time, I will catch up with newest builds and check times - but even then I don't think I have the power to fix the issue (not to mention to first find it) :) If the issue is only under jail, then something must have had changed between 12 and 13 releases that affected something.

nkosi23 commented 1 year ago

Maybe iocage and bastille create jails with different default parameters. Also if I remember correctly, I have a sense that some folks on FreeBSD Forums mentioned that iocage was optimized for ZFS, I don't remember the context though but I got a sense that iocage works best with ZFS.

Maybe the author relies on special ZFS features internally to ease jail management and didn't optimize scenarios in which ZFS is not available beyond providing basic support. This may be as basic as missing nullfs flags.

Actually it seems not even officially supported:

Will iocage work on a generic system with no ZFS pools? No. ZFS is a must. If you run a FreeBSD server, you should be using ZFS!

https://iocage.readthedocs.io/en/latest/faq.html

nkosi23 commented 1 year ago

Okay nevermind, I just got what you meant when you said: I didn't set up the VM for ZFS 😄 this was for the specific test you did under bastille

sec commented 1 year ago

Fresh 13.1-RELEASE, install on UFS, under HyperV, 8core/8GB, setup jail using ezjail (all default, plus allow.mlock) - by the handbook, clone my repo, checkout to v7, install tools, init, build runtime, first run:

native sdk - host - Time Elapsed 00:35:47.18
native sdk - jail - Time Elapsed 00:33:31.93

second try, just did rm -rf artifacts (I know nuget are already downloaded, so times are lower)

native sdk - host - Time Elapsed 00:30:14.55
native sdk - jail - Time Elapsed 00:31:39.05

dotnet was mostly using one core on 100%, sometimes there were 4-6 more core used by a margin of % will try with crossbuild sdk tomorow, to see if times will change.

did quick run with cross build sdk, under host and from the start it was doing 8core at 100%.

Time Elapsed 00:19:29.36

edit: same result under jail, full cpu usage and time drop

Time Elapsed 00:19:20.22
arrowd commented 1 year ago

Is it possible to factor out a command line that reproduces the issue? If yes, try pressing Ctrl+T to see what does it do. It is also possible to use truss and ktrace to figure out what takes so much time.

sec commented 1 year ago

The issue is that, during restore phase, only 1 core is used.

arrowd commented 1 year ago

Is it a build system program? That is, does it spawn N processes normally and only 1 in this case? Or the program itself decides how many threads to spawn and does so wrongly on FreeBSD 13?

sec commented 1 year ago

That's up to msbuild I think - I will try to get some time later and go through verbose logs to check and/or play with manual maxCpuCount setting to check the load.

edit: number of processes is the same - point is, cross-compiled sdk (for 12) is making use of all cores, while native-compiled-sdk (under 13) is making use of 1 - how to debug this further, any idea?

janvorli commented 1 year ago

cross-compiled sdk (for 12) is making use of all cores, while native-compiled-sdk (under 13) is making use of 1

I wonder if the HAVE_xxx detected during the build match between the cross and native build. The cross build uses ones in the eng/tryrun.cmake, so maybe the native build detection doesn't detect something correctly.

Thefrank commented 1 year ago

I dug up some older build logs to see if I could find any differences. All of these are missing/not found/failed in FreeBSD 12.3 (the image used for crossbuilds and the CI system) but are in 13.1 (what most use for native builds, no x64 images for this yet):

sched_getaffinity sched_setaffinity sched_getcpu HAVE_STAT_NSEC ucol_clone HAVE_CLOCK_MONOTONIC_COARSE

The first two were causing CPU_COUNT build errors as HAVE_GNU_CPU_COUNT would fail. This was resolved by https://github.com/dotnet/runtime/pull/77867 (more details there including how both the linux kernel and freebsd define CPU_COUNT)

Crossbuilds (using dotnet/arcade's build-rootfs.sh to generate the rootfs) using FreeBSD-x64 13.1 have the same issue as native FreeBSD 13.1 builds.

sec commented 1 year ago

@Thefrank during latest build of preview3 I've noticed that during restore phase (and also on later stages), all cpu cores are now used. I'm using my native 8p1 build to build 8p3. Have you seen any increase/decrease of build times on your side?

Thefrank commented 1 year ago

SDKs are still crossbuilt using docker containers for crossrootfs...and those are for 12.3/13.0. With the SDK bump here https://github.com/Thefrank/freebsd-dotnet-runtime-nightly/commit/01c75e499949c262c82535a9e4e2c368f2ff6080, run+test is still around 50min.

With 13.2 now out, and if the libunwind issues are resolved, I will go ahead and bump the versions that arcade uses for generating images and crossrootfs.

sec commented 1 year ago

I believe the issue have been fixed, as I see improved build speeds for quite a time now, closing.