Open lschneiderbauer opened 1 year ago
@lschneiderbauer : Any difference with cabal-install 3.8.1.0 or 3.10.1.0?
I don't think there should be. But I also don't know if this is something we can or should fix. As the linked tickets describe "We also want to be able to use the pkgdb purely, which means we can't push checking for versions to the call-site where we know which packages are required. Further we don't know which packages we need to query until midway in the solver (since that depends on which deps we pick), so we can't first winnow down the list."
The issue amounts to a problem with the configuration of the user's system, which in turn affects cabal. It would be nice if cabal wasn't affected by misconfigured systems, but architecturally the only way to do so seems to be to turn the entire currently pure solver into an impure one -- which would be a pretty adverse result.
@gbaz wrote:
architecturally the only way to do so seems to be to turn the entire currently pure solver into an impure one
I would think we could isolate queries to pkg-config
in an API which cashes answers and pretends to be pure. Has this been tried yet?
@lschneiderbauer : To make progress on this, we need a reproduction of this behavior.
Could you make some GitHub Action workflow that exhibits this behavior? This would amount to create the problematic package configuration on a GHA virtual environment (I guess ubuntu-22.04
) so that a cabal init & cabal run
takes long (as you describe). I think it is fine to use the preinstalled cabal
which should exhibit the problem.
Once we have such a workflow we can test proposed fixes with it, and we can port this workflow to our CI to prevent regressions.
@lschneiderbauer : Do you think you can provide us with such a reproducer?
Any difference with cabal-install 3.8.1.0 or 3.10.1.0?
I can test as soon gentoo offers those packages (afaik they are working on it).
Do you think you can provide us with such a reproducer?
@andreasabel It probably will take me some time, due to lack of experience and other distractions, but I will try.
The issue amounts to a problem with the configuration of the user's system, which in turn affects cabal. It would be nice if cabal wasn't affected by misconfigured systems, but architecturally the only way to do so seems to be to turn the entire currently pure solver into an impure one -- which would be a pretty adverse result.
It's understandable that you don't want to give that up. How are you guys dealing with packages that don't support pkg-config? Afaik it's not really a required standard for e.g. linux (or is it?) and almost any C build-scripts I know has fallback mechanisms in place in case pkg-config is not available or the required library does not provide .pc files. Are you simply pretending those libraries and systems don't exist?
From that point of view, would it make sense if the user could influence the choice of tool to get the required information? Let's say pkg-config is the default, but I could say something along the lines of cabal run --pkg-info = my_tool_of_choice
. ?
Cabal historically just left it to users to configure their systems so that desired libs were available in the libdirs, and allowed cabal files to set extra-libs and users to set --extra-lib-dirs. It later introduced pkgconfig-depends as a mechanism to provide auto-discovery of packages in systems supporting it. We recently fixed the way pkgconfig-depends interacts with the solver so that an adroit library author can first set the pkgconfig-depends field, and if that fails, it flips an auto-flag and instead tries to build using extra-libs.
I really want to motivate not attempting to address this directly -- having an "ostensibly pure" interface wrapping unsafePerformIO calls to shelling out to an external program is a pretty fragile and ugly approach. And again, what we have should work correctly, as long as a general system is not in some sense overall misconfigured. I think cabal should try to work around common system misconfigurations where possible (this is for example how we fallback to package-by-package queries in the case of a completely broken package db), but where its overly complicated I think we can just leave things be.
Hadn't we had several reports in this direction, I would agree.
Evidence suggests however that cabal
is too brittle here by insisting it wants the data for all packages before even doing something. One broken package in a remote corner of the world (=system) can throw cabal
off, even if it was completely detached from the constraint solving problem at hand.
The question that should be answered here is well how easy it is to get from a very unspecific symptom (taking long) to the specific cause (some misconfigured package somewhere) that is not really in scope because it is detached from the task that the user wanted cabal
to perform.
Hadn't we had several reports in this direction, I would agree.
The issue you linked to was fixed.
Hello, i have a similar problem. pkg-config --modversion
for all packages takes a long time and fails.
Cabal then runs pkg-config --modversion
for every package separately which takes even longer.
I am on a vanilla Arch system. I have tried to set pkg-config-location
to /bin/false
in ~/.cabl/config
, but Cabal does not respect this setting. My current workaround is to rename /usr/bin/pkg-config
while I'm doing Haskell development.
I don't think it is a viable assumption that pkg-config database is always in good shape, and pkg-config itself works fine in those situations. I believe that it is a mistake to query for packages that Cabal does not absolutely need. Even if the databse is in tact, this takes a very long time on systems which have a lot of .pc files.
This bug is hitting hard on those of us who use a distribution which always installs development files β e.g. Arch or any source-based distribution. On those systems querying for all packages is going to take a long time regardless of any issues in the database. Also, many installed packages often have uninstalled optional dependencies, and .pc files for those packages will not be present on the system, and it is an expected situation.
Here is an extract from my cabal buld log with added comments:
Searching for curl in path.
Found curl at /usr/bin/curl
Searching for powershell in path.
Cannot find powershell on the path
Searching for wget in path.
Found wget at /usr/bin/wget
Selected http transport implementation: curl
Why does Cabal continue looking for a downloader even when it found its preferred one?
Searching for pkg-config in path.
Found pkg-config at /usr/bin/pkg-config
Running: /usr/bin/pkg-config --version
/usr/bin/pkg-config is version 1.8.1
Running: /usr/bin/pkg-config --variable pc_path pkg-config
Searching for pkg-config in path.
Found pkg-config at /usr/bin/pkg-config
Running: /usr/bin/pkg-config --version
/usr/bin/pkg-config is version 1.8.1
Why does Cabal run pkg-config --version
twice for the same binary?
Running: /usr/bin/pkg-config --list-all
Running: /usr/bin/pkg-config --modversion libdaemon libraw1394 tre xcb-xrm lqr-1 redland jbig2dec gumbo lrdf microdns libsysfs fuse libmng gts libxdg-basedir libmypaint-gegl libmypaint fdk-aac ogg ldacBT-abr ldacBT-enc libfreeaptx libaria2 libexttextcat tidy rest-0.7 rest-extras-0.7 colord colorhug vorbis vorbisenc vorbisfile x265 librtmp SvtHevcEnc graphene-1.0 graphene-gobject-1.0 mozjs-78 libnetfilter_conntrack libnfnetlink libmnl libvmaf jsoncpp xplc-0.3.13 oniguruma jansson libabw-0.1 libfreehand-0.1 libpagemaker-0.0 libstaroffice-0.0 mtdev slirp snappy libverto-libevent libverto speexdsp speex libidn liblangtag-gobject liblangtag libbs2b sbc lame libxcvt opencore-amrnb opencore-amrwb liblz4 libtasn1 x264 ltc vdehist vdemgmt vdeplug vdesnmp mozilla-nspr nspr serd-0 sord-0 lv2 sratom-0 espeak-ng xxf86vm xtst numa 'libcdio++' libcdio 'libiso9660++' libiso9660 libudf faad2 xrender pciaccess freeglut glut xmu xmuu libsoup-2.4 libsoup-gnome-2.4 xcb-renderutil xcb-image xcb-cursor xcb-keysyms xcb-ewmh xcb-icccm libffi expat xext xinerama libntfs-3g pixman-1 libevent libevent_core libevent_extra libevent_openssl libevent_pthreads libkmod libasyncns shout ldns opusfile opusurl libuniconf libwvbase libwvstreams libwvtest libwvutils popt libidn2 libnumbertext libpipeline xpresent soundtouch libtirpc ijs guile-1.8 libepubgen-0.1 libodfgen-0.1 libwpd-0.10 bzip2 dconf libunwind-coredump libunwind-generic libunwind-ptrace libunwind-setjmp libunwind xrandr gssdp-1.6 tcl egl gl glesv2 glx libglvnd opengl xcursor xmlsec1-gcrypt xmlsec1-gnutls xmlsec1-nss xmlsec1-openssl xmlsec1 libbluray xdamage xcomposite xv wavpack tslib libb2 md4c-html md4c rnnoise gssrpc kadm-client kadm-server kdb krb5-gssapi krb5 mit-krb5-gssapi mit-krb5 xshmfence jasper libcddb fontenc xkbfile xkbcomp history readline lzo2 libiec61883 soxr-lsr soxr sm libomxil-bellagio dotconf libqrencode 'sigc++-3.0' xcb-atom xcb-aux xcb-event xcb-util gdk-pixbuf-2.0 hunspell libmodplug SPIRV-Tools-shared SPIRV-Tools xkbcommon xkbregistry xkbcommon-x11 libpsl librevenge-0.0 librevenge-generators-0.0 librevenge-stream-0.0 libetonyek-0.1 libclucene-core guile-3.0 msgpack libpcre2-16 libpcre2-32 libpcre2-8 libpcre2-posix fontconfig yaml-0.1 libsasl2 libip4tc libip6tc libipq libiptc xtables libcanberra-gtk libcanberra-gtk3 libcanberra blas lapack libsrtp2 com_err e2p ext2fs ss SDL2_image cairo-fc cairo-ft cairo-gobject cairo-pdf cairo-png cairo-ps cairo-script-interpreter cairo-script cairo-svg cairo-xcb-shm cairo-xcb cairo-xlib-xrender cairo-xlib cairo libacl graphite2 libmysofa vpx mad liburcu-bp liburcu-cds liburcu-mb liburcu-memb liburcu-qsbr liburcu-signal liburcu libass liblsmash libotf lcms2 pango pangocairo pangofc pangoft2 pangoot pangoxft tinycompress libchromaprint wireplumber-0.4 xau xdmcp ice xfont2 libcryptsetup libnsl efiboot efisec efivar faac libmpg123 libout123 libsyn123 libpulse-mainloop-glib libpulse-simple libpulse libusb-1.0 portaudio-2.0 portaudiocpp fftw3 fftw3f fftw3l fftw3q guile-2.2 ao avtp id3tag libinstpatch-1.0 kate oggkate samplerate xaw6 xaw7 libpkgconf rasqal 'libxml++-5.0' liblo libattr gpg-error libgcrypt xfixes wayland-client wayland-cursor wayland-egl-backend wayland-egl wayland-scanner wayland-server vidstab xt lept atomic_ops isl mujs libcxl libdaxctl libndctl sfsexp libvisual-0.4 libparted-fs-resize libparted gupnp-igd-1.6 nice raqm openal libzstd xpm xft libpcap gail gdk-2.0 gdk-x11-2.0 'gtk+-2.0' 'gtk+-unix-print-2.0' 'gtk+-x11-2.0' zzipfseeko zziplib zzipmmapped zzipwrap hwloc opus twolame vamp-hostsdk vamp-sdk vamp libbrotlicommon libbrotlidec libbrotlienc libkeyutils libcap-ng libargon2 capstone lilv-0 libseccomp 'notcurses++' notcurses-core notcurses-ffi notcurses libpci libfdt xi gck-1 gcr-3 gcr-base-3 gcr-ui-3 xres libwnck-3.0 libssh tss2-esys tss2-fapi tss2-mu tss2-policy tss2-rc tss2-sys tss2-tcti-cmd tss2-tcti-device tss2-tcti-libtpms tss2-tcti-mssim tss2-tcti-pcap tss2-tcti-spi-helper tss2-tcti-swtpm tss2-tctildr libevdev libmwaw-0.3 olm libwps-0.4 libwpg-0.3 libcap libpsx libwmf gail-3.0 gdk-3.0 gdk-broadway-3.0 gdk-wayland-3.0 gdk-x11-3.0 'gtk+-3.0' 'gtk+-broadway-3.0' 'gtk+-unix-print-3.0' 'gtk+-wayland-3.0' 'gtk+-x11-3.0' libalpm babl-0.1 libcares libexif libheif liba52 x11-xcb x11 form formw menu menuw 'ncurses++' 'ncurses++w' ncurses ncursesw panel panelw tic tinfo bash slang pam pam_misc pamc audiofile hogweed nettle libcupsfilters libfontembed dav1d dbus-glib-1 dbus-python duktape bdw-gc fribidi libavc1394 rav1e datrie-0.2 libthai theora theoradec theoraenc OpenCL ocl-icd libdvbv5 libv4l1 libv4l2 libv4l2rds libv4lconvert epoxy libraw libraw_r libspiro libde265 gmime-3.0 orc-0.4 orc-test-0.4 libdc1394-2 libdca libdts libgme libdv mjpegtools neon dvdread dvdnav spandsp WildMIDI wildmidi zbar-gtk zbar-qt zbar 'caca++' caca libmpeg2 libmpeg2convert libatasmart bytesize libcdio_cdda libcdio_paranoia gusb libmicrohttpd libopusenc libtraceevent liburing-ffi liburing libstartup-notification-1.0 sdl12_compat libwoff2common libwoff2dec libwoff2enc libtracefs libssh2 'flac++' flac fluidsynth libassuan libical-glib libical exiv2 libpng libpng16 libcdr-0.1 libe-book-0.1 libmspub-0.1 libqxp-0.0 libvisio-0.1 libzmf-0.0 raptor2 libuv taglib taglib_c brltty libcrypt libxcrypt ksba icu-i18n icu-io icu-uc zimg gudev-1.0 zxing libexslt libxslt Qt6Concurrent Qt6Core Qt6DBus Qt6Gui Qt6Network Qt6OpenGL Qt6OpenGLWidgets Qt6Platform Qt6PrintSupport Qt6Sql Qt6Test Qt6Widgets Qt6Xml Qt6Core5Compat Qt6Svg Qt6SvgWidgets tesseract wolfssl libbpf mpv tdb pytalloc-util.cpython-311-x86_64-linux-gnu talloc libxxhash lensfun rubberband xapian-core gupnp-1.6 gmp gmpxx lber ldap audit auparse liblzma libdebuginfod libdw libelf libapparmor sysprof-capture-4 libxml-2.0 libmfx mfx lc3 libjpeg libturbojpeg libproxy-1.0 fuse3 polkit-agent-1 polkit-gobject-1 INIReader inih gexiv2 valgrind libmagic glu gnutls libnftnl libsecret-1 libsecret-unstable speech-dispatcher webrtc-audio-processing poppler-cpp poppler poppler-glib blkid fdisk mount smartcols uuid devmapper-event devmapper xcb-composite xcb-damage xcb-dbe xcb-dpms xcb-dri2 xcb-dri3 xcb-glx xcb-present xcb-randr xcb-record xcb-render xcb-res xcb-screensaver xcb-shape xcb-shm xcb-sync xcb-xf86dri xcb-xfixes xcb-xinerama xcb-xinput xcb-xkb xcb-xtest xcb-xv xcb-xvmc xcb libgit2 imlib2 sndfile libpcre libpcre16 libpcre32 libpcrecpp libpcreposix weechat json-c zlib bluez ell gpgme-glib gpgme libdrm libdrm_amdgpu libdrm_intel libdrm_nouveau libdrm_radeon libinput mpfr synctex xscrnsaver libzip minizip Qt6LabsAnimation Qt6LabsFolderListModel Qt6LabsQmlModels Qt6LabsSettings Qt6LabsSharedImage Qt6LabsWavefrontMesh Qt6Qml Qt6QmlCore Qt6QmlIntegration Qt6QmlLocalStorage Qt6QmlModels Qt6QmlWorkerScript Qt6QmlXmlListModel Qt6Quick Qt6QuickControls2 Qt6QuickControls2Impl Qt6QuickDialogs2 Qt6QuickDialogs2QuickImpl Qt6QuickDialogs2Utils Qt6QuickLayouts Qt6QuickTemplates2 Qt6QuickTest Qt6QuickWidgets Qt6WaylandClient Qt6WaylandCompositor SvtAv1Dec SvtAv1Enc kpathsea ptexenc texlua53 texluajit uchardet xorg-libinput zvbi-0.2 libavif gdlib vdpau libprofiler libtcmalloc libtcmalloc_debug libtcmalloc_minimal libtcmalloc_minimal_debug libhwy-contrib libhwy-test libhwy libwacom miniupnpc tdactor tdapi tdclient tdcore tddb tdjson tdjson_private tdjson_static tdnet tdsqlite tdutils aom libnl-3.0 libnl-cli-3.0 libnl-genl-3.0 libnl-idiag-3.0 libnl-nf-3.0 libnl-route-3.0 libnl-xfrm-3.0 freetype2 Imath PyImath OpenEXR libopenjp2 gegl-0.4 gegl-sc-0.4 libjxl libjxl_threads gimp-2.0 gimpthumb-2.0 gimpui-2.0 hidapi-hidraw hidapi-libusb sdl2 libcmis-0.5 libcmis-c-0.5 libixion-0.18 liborcus-0.18 liborcus-spreadsheet-model-0.18 m17n-core m17n-flt m17n-gui m17n-shell source-highlight libnghttp2 libedit dbus-1 avahi-client avahi-compat-libdns_sd avahi-core avahi-glib avahi-gobject avahi-libevent avahi-qt5 avahi-ui-gtk3 libjq cloudproviders d3d dri gbm osmesa xatracker mozilla-nss nss ompi-c ompi-cxx ompi-f77 ompi-f90 ompi-fort ompi orte libqpdf sox haisrt srt gio-2.0 gio-unix-2.0 glib-2.0 gmodule-2.0 gmodule-export-2.0 gmodule-no-export-2.0 gobject-2.0 gthread-2.0 libopenmpt vulkan sqlite3 giomm-2.68 glibmm-2.68 libcdt libcgraph libgvc libgvpr liblab_gamut libpathplan libxdot libarchive liblouis libtommath luajit pmix pppd python-3.11-embed python-3.11 python3-embed python3 pygobject-3.0 absl_absl_check absl_absl_log absl_algorithm absl_algorithm_container absl_any absl_any_invocable absl_atomic_hook absl_bad_any_cast absl_bad_any_cast_impl absl_bad_optional_access absl_bad_variant_access absl_base absl_base_internal absl_bind_front absl_bits absl_btree absl_check absl_city absl_civil_time absl_cleanup absl_cleanup_internal absl_common_policy_traits absl_compare absl_compressed_tuple absl_config absl_container_common absl_container_memory absl_cord absl_cord_internal absl_cord_test_helpers absl_cordz_functions absl_cordz_handle absl_cordz_info absl_cordz_sample_token absl_cordz_statistics absl_cordz_update_scope absl_cordz_update_tracker absl_core_headers absl_counting_allocator absl_crc32c absl_crc_cord_state absl_crc_cpu_detect absl_crc_internal absl_debugging absl_debugging_internal absl_demangle_internal absl_die_if_null absl_dynamic_annotations absl_endian absl_errno_saver absl_examine_stack absl_exponential_biased absl_failure_signal_handler absl_fast_type_id absl_fixed_array absl_flags absl_flags_commandlineflag absl_flags_commandlineflag_internal absl_flags_config absl_flags_internal absl_flags_marshalling absl_flags_parse absl_flags_path_util absl_flags_private_handle_accessor absl_flags_program_name absl_flags_reflection absl_flags_usage absl_flags_usage_internal absl_flat_hash_map absl_flat_hash_set absl_function_ref absl_graphcycles_internal absl_hash absl_hash_function_defaults absl_hash_policy_traits absl_hash_testing absl_hashtable_debug absl_hashtable_debug_hooks absl_hashtablez_sampler absl_if_constexpr absl_inlined_vector absl_inlined_vector_internal absl_int128 absl_kernel_timeout_internal absl_layout absl_leak_check absl_log absl_log_entry absl_log_flags absl_log_globals absl_log_initialize absl_log_internal_append_truncated absl_log_internal_check_impl absl_log_internal_check_op absl_log_internal_conditions absl_log_internal_config absl_log_internal_flags absl_log_internal_format absl_log_internal_globals absl_log_internal_log_impl absl_log_internal_log_sink_set absl_log_internal_message absl_log_internal_nullguard absl_log_internal_nullstream absl_log_internal_proto absl_log_internal_strip absl_log_internal_structured absl_log_internal_voidify absl_log_severity absl_log_sink absl_log_sink_registry absl_log_streamer absl_log_structured absl_low_level_hash absl_malloc_internal absl_memory absl_meta absl_node_hash_map absl_node_hash_set absl_node_slot_policy absl_non_temporal_arm_intrinsics absl_non_temporal_memcpy absl_nullability absl_numeric absl_numeric_representation absl_optional absl_periodic_sampler absl_prefetch absl_pretty_function absl_random_bit_gen_ref absl_random_distributions absl_random_internal_distribution_caller absl_random_internal_distribution_test_util absl_random_internal_fast_uniform_bits absl_random_internal_fastmath absl_random_internal_generate_real absl_random_internal_iostream_state_saver absl_random_internal_mock_helpers absl_random_internal_nonsecure_base absl_random_internal_pcg_engine absl_random_internal_platform absl_random_internal_pool_urbg absl_random_internal_randen absl_random_internal_randen_engine absl_random_internal_randen_hwaes absl_random_internal_randen_hwaes_impl absl_random_internal_randen_slow absl_random_internal_salted_seed_seq absl_random_internal_seed_material absl_random_internal_traits absl_random_internal_uniform_helper absl_random_internal_wide_multiply absl_random_mocking_bit_gen absl_random_random absl_random_seed_gen_exception absl_random_seed_sequences absl_raw_hash_map absl_raw_hash_set absl_raw_logging_internal absl_sample_recorder absl_scoped_mock_log absl_scoped_set_env absl_span absl_spinlock_wait absl_spy_hash_state absl_stacktrace absl_status absl_statusor absl_str_format absl_str_format_internal absl_strerror absl_string_view absl_strings absl_strings_internal absl_symbolize absl_synchronization absl_throw_delegate absl_time absl_time_zone absl_type_traits absl_utility absl_variant alsa-topology alsa libcrypto libssl openssl libsystemd libudev p11-kit-1 libcurl atk-bridge-2.0 atk atspi-2 libbtrfsutil libtiff-4 harfbuzz-gobject harfbuzz-subset harfbuzz cups libpipewire-0.3 libspa-0.2 libcamera-base libcamera webrtc-audio-coding-1 webrtc-audio-processing-1 'lua++-5.4' 'lua++' 'lua++5.4' 'lua++54' lua-5.4 lua lua5.4 lua54 jack librsvg-2.0 libva-drm libva-glx libva-wayland libva-x11 libva libsharpyuv libwebp libwebpdecoder libwebpdemux libwebpmux vpl libdeflate libavcodec libavdevice libavfilter libavformat libavutil libpostproc libswresample libswscale gobject-introspection-1.0 gobject-introspection-no-export-1.0 gstreamer-1.0 gstreamer-base-1.0 gstreamer-check-1.0 gstreamer-controller-1.0 gstreamer-net-1.0 gstreamer-allocators-1.0 gstreamer-app-1.0 gstreamer-audio-1.0 gstreamer-fft-1.0 gstreamer-gl-1.0 gstreamer-gl-egl-1.0 gstreamer-gl-prototypes-1.0 gstreamer-gl-wayland-1.0 gstreamer-gl-x11-1.0 gstreamer-pbutils-1.0 gstreamer-plugins-base-1.0 gstreamer-riff-1.0 gstreamer-rtp-1.0 gstreamer-rtsp-1.0 gstreamer-sdp-1.0 gstreamer-tag-1.0 gstreamer-video-1.0 libsoup-3.0 gstreamer-bad-audio-1.0 gstreamer-codecparsers-1.0 gstreamer-cuda-1.0 gstreamer-insertbin-1.0 gstreamer-mpegts-1.0 gstreamer-photography-1.0 gstreamer-play-1.0 gstreamer-player-1.0 gstreamer-plugins-bad-1.0 gstreamer-sctp-1.0 gstreamer-transcoder-1.0 gstreamer-va-1.0 gstreamer-vulkan-1.0 gstreamer-vulkan-wayland-1.0 gstreamer-vulkan-xcb-1.0 gstreamer-wayland-1.0 gstreamer-webrtc-1.0 gstreamer-webrtc-nice-1.0 json-glib-1.0 ImageMagick-7.Q16HDRI ImageMagick 'Magick++-7.Q16HDRI' 'Magick++' MagickCore-7.Q16HDRI MagickCore MagickWand-7.Q16HDRI MagickWand libproc2 libnvme-mi libnvme blockdev-utils blockdev udisks2 tracker-sparql-3.0 tracker-testutils-3.0 harfbuzz-icu jemalloc imagequant shaderc shaderc_combined shaderc_static dovi libplacebo lua-5.1 lua5.1 lua51 lua-5.3 lua5.3 lua53 protobuf-lite protobuf utf8_range AMD BTF CAMD CCOLAMD CHOLMOD COLAMD CXSparse GPUQREngine GraphBLAS KLU KLU_CHOLMOD LDL Mongoose RBio SPEX SPQR SuiteSparse_GPURuntime SuiteSparse_config UMFPACK mypaint-brushes-1.0 shared-mime-info poppler-data gsettings-desktop-schemas iso-codes adwaita-icon-theme libmakepkg bigreqsproto compositeproto damageproto dmxproto dpmsproto dri2proto dri3proto fixesproto fontsproto glproto inputproto kbproto presentproto randrproto recordproto renderproto resourceproto scrnsaverproto videoproto xcmiscproto xextproto xf86bigfontproto xf86dgaproto xf86driproto xf86vidmodeproto xineramaproto xproto xwaylandproto xkeyboard-config wayland-protocols xcb-proto m17n-db hwdata systemd udev
/usr/bin/pkg-config returned ExitFailure 1
/usr/bin/pkg-config returned ExitFailure 1 with error message:
Package OpenCL-Headers was not found in the pkg-config search path.
Perhaps you should add the directory containing `OpenCL-Headers.pc'
to the PKG_CONFIG_PATH environment variable
Package 'OpenCL-Headers', required by 'OpenCL', not found
Package 'OpenCL-Headers', required by 'ocl-icd', not found
Package 'Qt5Core', required by 'zbar-qt', not found
Package 'Qt5Gui', required by 'zbar-qt', not found
Package 'gtest', required by 'libhwy-test', not found
Package 'Qt5Core', required by 'avahi-qt5', not found
Package 'absl_random_internal_mock_overload_set', required by
'absl_random_mocking_bit_gen', not found
call to pkg-config --modversion on all packages failed. Falling back to
querying pkg-config individually on each package
Why does Cabal continue looking for a downloader even when it found its preferred one?
Because it collects downloaders into its programdb, and then queries the db to make a choice. In general, cabal attempts to factor its work into "effectful information gathering" and "pure logic" phases which can regularize error handling, etc and clarify the logic.
Why does Cabal run pkg-config --version twice for the same binary?
Because the first call lists all the packages, and the second call asks for modversion on them list thus acquired. There's no way to get all packages and their modversion results in a single execution of pkg-config afaik.
And again, if we didn't call this on all packages upfront, we would be required to interleave pkg-config logic into the cabal solver, which is currently, and thankfully, pure. Since the logic of that solver is so complex, making it still more complicated seems like a bad idea.
I am on a vanilla Arch system. I have tried to set pkg-config-location to /bin/false in ~/.cabl/config, but Cabal does not respect this setting.
If this doesn't work, then that's a bug we should try to fix.
And again, if we didn't call this on all packages upfront, we would be required to interleave pkg-config logic into the cabal solver, which is currently, and thankfully, pure.
I understand that letting go of the purity of Cabal's solver is a compromise that you are not willing to make. But aren't there any other options?
Things that come to my mind (without understanding the internals of Cabal at all):
1) Add a separate pass (after resolving) that checks the presence of non-Haskell libraries. If it fails, cabal would just give up with an error.
2) Defer checking the existence of non-Haskell libraries to the configuration phase of a package. This might fail in the middle of a build, which is nasty, but at least it does not slow down every build.
The user could be given a choice about how they want this checking to be done. There might be an option of not doing any checks at all, which obviously will fail latest at link time, if a library is missing.
Cabal could also print a message (on verbosity level 0) that gives the user some information about what is happening β such as "Querying pkg-config database...". If this takes a long time, the user has immediately more information about what is going wrong. I don't believe this would make anyone's user experience worse in any way. THis is already done for package resolution β which takes often much less time than querying the pkg-config database.
I don't know how many builds really rely on pkg-config support, but Iassume, that there are more builds which do not need pkg-config than those that do. Therefore I believe that querying pkg-config database in some other way would improve build times for everybody.
I am on a vanilla Arch system. I have tried to set pkg-config-location to /bin/false in ~/.cabl/config, but Cabal does not respect this setting.
If this doesn't work, then that's a bug we should try to fix.
Yes. However, if nothing else is done, Iwish there would be an option like
--disable-pkg-config
, which would be documented. It would feel much less hacky
than pointing pkg-config-location
to an invalid or non-existing pkg-config
executable.
Thank you for your time!
The reason it's part of the solver is that existence of pkg-config packages can be used to conditionalize build plans, e.g. falling back to a slower pure-Haskell package if a fast C binding isn't available on a system.
The reason it's part of the solver is that existence of pkg-config packages can be used to conditionalize build plans, e.g. falling back to a slower pure-Haskell package if a fast C binding isn't available on a system.
Would it be possible to try resolving packages first assuming that all libraries are available. Then Cabal could check, if the packages selected for the plan actually are available, and if not, rerun the solver giving it information about missing packages. Of course this might lead to several resolution attempts, but would it be that bad, as this would not be very likely?
It seems to me that the slowness here comes from the fallback path checking each pkg individually, no? and that if the pkg-config db didn't have errors this wouldn't happen?
Also, when we do fall back, it does print a message it is falling back, correct?
It seems to me that the slowness here comes from the fallback path checking each pkg individually, no?
Well, of course it contributes to that.
I timed the execution of pkg-config --modversion
for all packages on my
system. (I mean the execution of pkg-config with every installed pkg-config
package listed on the same command line. I'm not talking about the fallback of
running pkg-config separately for each package.) It took 19.5 seconds, and it
fails, as we already know.
Now, I removed all offending packages with broken dependencies from the list (there were 10 of them), and run the command again. Now it takes 34 seconds and properly outputs the version numbers. It seems that pkg-config runs much faster when it fails, probably because it avoids some work when it knows that the whole process is not going to succeed anyway.
and that if the pkg-config db didn't have errors this wouldn't happen?
It would not, but 34 seconds of pkg-config in most of cabal runs is way too much wasted time. On my system I have 1191 pkg-config packages. I don't even have any single desktop environment installed. If I had a full installation of gnome or KDE, the amount of these packages would be even higher.
I am running an Intel Core i7 machine with 4 cores clocked at 2GHz and 8 GB of RAM.
The situation is completely different on a distro which has separate -dev packages for development files.
Another thing is that on Arch it is not likely that the package db ever has all dependent packages present. As I explained, some dependencies are optional, and if the user does not isntall them manually, the packages will not be there. Installing more packages (which are not needed) in order to shorten cabal build times would be a bad solution to the problem especially because the lack of these packages is not an issue for any other software that I have ever used.
Also, when we do fall back, it does print a message it is falling back, correct?
Well yes, but one needs to pass -vX to cabal in order to see that. I don't know what this X is.
A patch to bump verbosity level on that output would be straightforward and welcome.
(edit: and if arch provides a "broken" packagedb by default, i think they should patch their pkg-config to not break on it)
A patch to bump verbosity level on that output would be straightforward and welcome.
Does this mean that the issue of a 34 seconds delay on cabal initialization (assuming normal operation and intact pkg-config database) is not considered serious enough to be fixed?
No. It means that this is another proposal that would be welcome and straightforward to implement (just like the earlier mention of fixing the treatment of pkg-config path in the cabal.config file.)
For reasons discussed above, the proposals thus far to actually address the core issue mainly seem to be nonstarters that would be architecturally complex and fragile.
However, the proposal to "optimistically" just assume all the libraries are there is in fact the behavior cabal falls back to when it cannot find the pkg-config database. So perhaps a flag "--skip-pgkconfig" or the like which more obviously bypasses this, and text in that warning suggesting that possibility could work?
It seems to me that the slowness here comes from the fallback path checking each pkg individually, no? and that if the pkg-config db didn't have errors this wouldn't happen?
To add to @Merivuokko's data, I'm on a similarly-powerful machine, also running Arch, with 1757 pkg-config entries. The one-at-a-time fallback approach takes about 7s, but the happy path, after I've manually removed three broken packages from the invocation, still takes 3.5s, which is enough to dominate the time Cabal takes to perform simple operations.
So I think regardless of the existence of broken packages, we need to find some way to avoid so much querying. If we don't want to impact the purity of the solver, then perhaps we could just do some caching? Recording timestamps for the directories listed by pkg-config --variable pc_path pkg-config
seems like it would be enough.
PS. Can someone please change the thread title to mention "pkg-config"? It would have made this easier to find.
@gbaz wrote:
Why does Cabal continue looking for a downloader even when it found its preferred one?
Because it collects downloaders into its programdb, and then queries the db to make a choice. In general, cabal attempts to factor its work into "effectful information gathering" and "pure logic" phases which can regularize error handling, etc and clarify the logic.
Maybe this design is outdated with the advent of using information from pkg-config
. It is without doubt very inefficient and its eagerness isn't in the spirit of Haskell either. I see two systematic remedies:
pkg-config
only on demand and memoising its response, accumulating the data in a lazy way. This could even be hidden behind a pure interface (with unsafePerformIO
) under the assumption that querying pkg-config
is basically accessing an immutable lookup-table (for the duration of a cabal
run at least).cabal update-pkgconfig-db
that queries pkg-config
and stores the info for all packages. The solver could then access this db, rather than constructing it from scratch each time.2. There would be a command
cabal update-pkgconfig-db
that queriespkg-config
and stores the info for all packages.
This is potentially a pretty annoying UX regression, so I'd see it as a last resort.
I notice that you didn't include my suggestion of caching pkg-config
results based on file timestamps. Is there an obvious flaw there that I'm missing?
@andreasabel I veto adding more lazy IO π.
So I think regardless of the existence of broken packages, we need to find some way to avoid so much querying. If we don't want to impact the purity of the solver, then perhaps we could just do some caching? Recording timestamps for the directories listed by
pkg-config --variable pc_path pkg-config
seems like it would be enough.
That seems like a great solution and I'd welcome a patch for it. I had considered caching before, but hadn't realized that there were potentially directories to check for invalidation on, so got stuck on that. The one caveat is that the standard cabal caching is per project not globally. However, that may actually be fine here, on first pass, since the main cost we're worried about is repeated runs on individual projects, not overall runs.
@georgefst wrote:
- There would be a command
cabal update-pkgconfig-db
that queriespkg-config
and stores the info for all packages.This is potentially a pretty annoying UX regression, so I'd see it as a last resort.
Well, it is in the spirit of cabal
which requires you to run cabal update
manually to get the latest versions for the Haskell packages. If you forget this step, you get all kinds of failures.
I suppose the difference is that cabal
can easily warn you about bringing your hackage index up-to-date, whereas there wouldn't be a similarly easy mechanism for the package-config db. Ok, admitted, without a suitable warning of that kind this would be a worsening of the UX (for those that do not suffer under the present issue).
@gbaz wrote:
The one caveat is that the standard cabal caching is per project not globally.
The hackage index is cached globally. The pkg-config cache could fall into the same ballpark.
I'd even consider having cabal update
update both caches, rather than having users deal with two separate commands.
Ultimately the real problem here is pkg-config
itself, but I don't think that's likely to be fixed. At least it's not a shell script any more; that'd be really slow.
The hackage index is cached globally. The pkg-config cache could fall into the same ballpark.
Right but we have a bunch of special code and commands for managing the hackage index. We also have standard per-project caching combinators that would let us get 90% of the benefit by adding 2 lines of code or so, rather than doing something much more complicated and invasive. I know which I prefer to start with :-)
Right but we have a bunch of special code and commands for managing the hackage index.
I think we can break out into a separate runRebuild
rooted somewhere in .cache/cabal? E.g.
liftIO $
runRebuild "~/.cache/cabal" $
rerunIfChanged verbosity (newFileMonitor "pkg-config.cache") _someKey $
getPkgConfigDb verbosity progdb
I know which I prefer to start with :-)
I know which one I will regret ;-)
one issue that causes this sadness on my system https://github.com/abseil/abseil-cpp/issues/1528
This issue is impacting startup time of HLS, a fix or a roadmap would be greatly appreciated.
Cleaning up #9360 is high on my todo list. But given that it might need to be substantially rewritten, and I can't really guarantee when I'll get back to it, I certainly don't mind if someone else wants to take over.
I'm inclined to agree with @andreabedini that unless it's really onerous, we should try to do this right first time, by caching globally instead of per-project.
If I understand the situation correctly:
--list-all
(which is fast) to get a list of all available packages, --modversion
to get the version of call packages, which is slow.In such a case using unsafeInterleaveIO
to defer the calculation of the package version until it is needed seems elegant to me. This doesn't seem any worse than the impurity already introduced by computing the version for all packages before embarking on solving. With the current approach you still enter into a race where the global configuration/version of a package might be changed during the solving process which would leave to an invalid install plan.
This also removes many complications to do with creating and maintaining a global cache. It is entirely possible and likely that a user will have a different environment which is seen by pkg-config
in each development shell, what is the plan for maintaining the cache in this situation? Is it invalidating each time the user switches project?
The different-environments-for-different-projects problem is one that I hadn't considered. I think the approach outlined in #9360 would only need a minor modification to account for it: store the path of each package as part of the cache, and make sure we only use packages whose paths appear in the current pkg-config --variable pc_path pkg-config
(we should also make sure to remove from the cache packages whose paths have disappeared, which I don't think #9360 currently does).
Although I'm not against the unsafeInterleaveIO
solution either.
My humble opinion as a person who is not a cabal developer is that caching is not a good solution, because:
1) It too wastes time and memory, 2) it is complicated to implement correctly taking into account all possible scenarios where it should be invalidated, 3) if there are bugs, they might be difficult to diagnose by a user who maybe does not know anything about pkg-config or caching, 4) most of the packages stored in pkg-config database are likely not useful from cabal's or Haskell developers' perspective. Therefore cabal should not collect information about those packages. I would very much prefer a minimal solution that only queries for information that is relevant.
This is not a suggestion to use unsafePerformIO (or its relatives). This is an invitation to look for alternatives.
One good alternative might be to just die, if a dependency package is missing. This way the pkg-config checks could be done after resolving, if a chosen package depends on a C library. Fallback options could be triggered by a user flag.
The problem with this is that pkg-config package availability is part of the solver information (that is, a solver plan can be conditional on availability of a pkg-config package). You are proposing to remove a feature that is in use.
The problem with this is that pkg-config package availability is part of the solver information (that is,. a solver plan can be conditional on availability of a pkg-config package). You are proposing to remove a feature that is in use.
Yes, I know. Is this widely used and by which kind of packages?
I can see a more explicit approach being an improvement. For example if the person building a package wants to utilize a particular C library, but has forgotten to install the development files. In that case it would be better for the person to receive an error than to build their package with alternative dependencies which is something they did not ask for.
Is there currently a way to explicitly choose between pkg-config and non-pkg-config alternatives, and getting an error, if the requested plan is not possible?
In one of my previous comments I also suggested an alternative, where the resolver would be first run assuming that all possible C libraries are present. If it then turns out, that the resulting plan relies on a C library, cabal would then (and only then) check if the pkg-config package is available or not. If not, the resolver would be run again, but this time providing it with the information about the missing pkg-config packages. Would this be difficult to implement? Would it add undesirable performance regressions in common cases?
Yes, I know. Is this widely used and by which kind of packages?
All packages that bind to c libraries, and all packages which depend on them. The reason this needs to be seamless is that a C package can be very far down the dependency chain, and the solver needs to handle that gracefully.
The proposed solution of running the solver multiple times, checking and failing on each one would indeed be both costly and difficult to implement, because it would effectively require rearchitecting the entire solver, one of the most complex bits in the whole of cabal-install.
All that said, I think we have a perfectly good solution proposed now if someone wants to do it -- per-project caching. This lets us use the existing cache mechanism, and pay an overhead cost once per project. Indeed, the caching might get screwy if users swap out the pkg-config db, but A) we can test for that, and B) this swapping will likely occur on a per project basis anyway, so won't cause requeries too often.
And bear in mind, the cost we are talking about shaving is somewhere between 3.5 and 7 seconds as described in a comment above, so if we have to pay a 3.5 second cost once per project I do not think this is onerous.
The solution proposed here also has the advantage of requiring probably three lines of code to implement.
I think @georgefst made a good start in #9360 and we should see that to completion.
Nevertheless, I have been thinking whether there are better alternatives to calling pkg-config.
I remember reading about pkgconf and this comment on the grpc issue above says using pkgconf could make a big difference:
Replacing pkg-config with pkgconf (which can be simlinked to pkg-config), in my CentOS 7, accelerated pkg-config --libs grpc++ or pkg-config --libs protobuf from 30 minutes to 2 seconds !!!
@lschneiderbauer @Merivuokko would you be so kind to see if using pkgconf helps in your situation?
(by the way, how do I get an environment simlar to yours?)
Another note: I tried to get a very rought benchmark with
β― strace -f -o trace.pkg-config -e trace=%file bash -c 'pkg-config --modversion $(pkg-config --list-package-names)' >/dev/null
β― strace -f -o trace.pkgconf -e trace=%file bash -c 'pkgconf --modversion $(pkgconf --list-package-names)' >/dev/null
β― wc -l trace.pkg*
1272 trace.pkgconf
2152 trace.pkg-config
3424 total
so there is a difference (~40% less file related syscalls).
Looking at trace, you can see that both pkgconf and pkg-config have to read and parse all files in pc_path
for --list-package-names
; only to re-resolve the path of each pc file when asked for --modversion
.
I could not find a way to get a list of all package names along with their version. This does not seem to be a common use-case.
Using pkgconf
and having pkg-config
just be a symlink is the default on a lot of systems, including Arch and AFAICT Gentoo. So I'd expect most of the numbers above are actually already using pkgconf
.
@gbaz wrote:
The solution proposed here also has the advantage of requiring probably three lines of code to implement.
If this is true, may we ask you for a PR? (Feels like it the time spent on this would be amortized by the time you do not have to spend in this discussion then...)
Naive question: can the solver find all the pkg-config dependencies that it could care about and then return those to the rest of cabal to only look at those relevant packages? The performance problem here seems to be asking pkgconf for 1000 packages, so what if we only asked for the 50 that could ever influence solving?
It seems the most affected users here are on Arch because of their pkg-config database policy, as well as the average size of Arch users' pkg-config databases due to not separating library and development packages.
However, just for context, it seems that Arch maintainers do not consider this a bug: https://bugs.archlinux.org/task/80171#comment223421
So for the benefit of users with this kind of system setup, an improvement in cabal execution time would really be nice here. (Personally I'd also feel for the lazy IO option, but that's just my opinion.)
On 2023-11-01 at 19:50 -0700, Andrea Bedini @.***> wrote:
@lschneiderbauer @Merivuokko would you be so kind to see if using pkgconf helps in your situation?
I am already using pkgconf.
(by the way, how do I get an environment simlar to yours?)
Install Arch or any source based distribution (such as Gentoo) with a lot of library packages β e.g. install a desktop environment.
Looking at trace, you can see that both pkgconf and pkg-config have to read and parse all files in
pc_path
for--list-package-names
; only to re-resolve the path of each pc file when asked for--modversion
. I could not find a way to get a list of all package names along with their version. This does not seem to be a common use-case.
This also demonstrates quite well that cabal tries to do something that is not intended to be done.
I understand the desire for correctness, but I believe a simple performant solution is better than a mathematically correct complicated error-prone solution. In my (admittedly uninformed) opinion, introducing a new cache falls to the second category.
But maybe the easiest thing to be done is to give the user an option to disable all pkg-config querying altogether. I will at least stick to this option, if this problem is about to be tackled by introducing a cache.
If I understand the situation correctly:
- The pkg-config discovery process proceeds in two steps, --list-all (which is fast) to get a list of all available packages, --modversion to get the version of call packages, which is slow.
In such a case using unsafeInterleaveIO to defer the calculation of the package version until it is needed seems elegant to me. This doesn't seem any worse than the impurity already introduced by computing the version for all packages before embarking on solving. With the current approach you still enter into a race where the global configuration/version of a package might be changed during the solving process which would leave to an invalid install plan.
I haven't looked into all of the trade-offs between caching and unsafe IO, but I wanted to mention that I think that the solver is already reading the source package index through a data structure that uses lazy IO. It seems to work well, since the solver's algorithm only requires looking up packages as they are needed, in dependency order. The pkg-config database could be used similarly. The main difference is that it isn't owned by cabal, so there could be more types of unexpected errors.
I'm really excited to see the progress on fixing this in #9422. In case it's helpful, I have another data point.
I'm running Arch as well, and pkg-config --list-all | wc -l
outputs 1721
. Running cabal clean && cabal build -v | ts
shows that building a small project takes 55 seconds, of which 8 seconds is spent on the giant bulk pkg-config --modversion
command (which fails with call to pkg-config --modversion on all packages failed. Falling back to querying pkg-config individually on each package
) and 18 seconds afterwards is spent on individual pkg-config --modversion
calls.
This issue roughly doubles my usual build times for clean builds. It's takes up much more time proportionately if I have made a change to only one module and the rest of the modules are still cached. Sometimes it needs to do the pkg-config
querying even if I haven't changed any modules, so running cabal build
just hangs silently for 28 seconds until it tells me Up to date
, having spent 90% of the time querying pkg-config
. I don't fully understand when the querying is and is not necessary.
I tried examining my pkg-config
database to see if I could get the bulk --modversion
working, but encountered the same issue as https://github.com/haskell/cabal/issues/8930#issuecomment-1783747116, where my system needs absl_random_internal_mock_overload_set
for abseil-cpp
(I can't remove this package - it's a transitive requirement of chromium
), but this isn't actually a real package and is just abseil upstream being broken.
Anyway, I'm really excited to see this resolved. It's one of my main pain points using Haskell right now.
It seems that the developers do not agree (or have the resources) to fix this issue at the moment.
In the meanwhile, I wish that there would be a working configuration option to disable all pkg-config checks. Could this be implemented and documented?
This issue is affecting many people, and it is possibly giving many users a very bad user experience without them knowing what slows down Cabal.
For those, who run a rolling-release OS, like Arch or Gentoo, disabling pkg-config operations is possible the best way to go in the future as well, if this issue is going to be solved by introducing caching. On these distributions, packages might be updated often, daily or even more frequently. It is likely that every update of the system invalidates all pkg-config caches kept up by cabal quite effectively destroying their purpose. (I mean that they were implemented to make life easier for those who use a rolling-release distribution which installs all development files of every package by default, but the caches get invalidated so frequently that they may end up being quite useless.)
I think it is really a good idea to not be hesitant and implement a suboptimal solution β especially one that would be compilicated to implement.
Defining pkg-config to be /bin/false in Cabal configuration is not a good solution either. I only want to prevent Cabal for querying a piece of information for every installed package, like the version check is implemented. If cabal needs to acquire link flags for a particular package (that it knows it needs in the build), I am totally willing to allow it to call pkg-config.
In the meanwhile, I wish that there would be a working configuration option to disable all pkg-config checks. Could this be implemented and documented?
That's interesting. I wonder if there's any way of doing this currently and, if not, if there are any tickets open. Perhaps it's worth it to open one? BTW, @Merivuokko, I had some trouble parsing your message. As if some negations were spurious, etc. You can edit and condense the message directly in the github UI if you feel like it.
On 2023-11-17 at 01:33 -0800, Mikolaj Konarski @.***> wrote:
In the meanwhile, I wish that there would be a working configuration option to disable all pkg-config checks. Could this be implemented and documented?
That's interesting. I wonder if there's any way of doing this currently and, if not, if there are any tickets open. Perhaps it's worth it to open one?
I created a new issue: https://github.com/haskell/cabal/issues/9458
BTW, @Merivuokko, I had some trouble parsing your message. As if some negations were spurious, etc.
Maybe this is because of my native language which is Finnish. The new issue hopefully makes my point clear.
Describe the bug Some cabal commands, e.g.
cabal v2-run
take around 20 minutes on my system (on a vanilla unalteredcabal init
project).I am using gentoo linux w/ cabal-install 3.6.2.0, ghc 9.2.7.
The reason for the huge runtime seems to be that cabal calls something like
pkg-config --list-all | cut -f 1 -d ' ' | xargs pkg-config --modversion
(see related discussion in pull request #8496). Executing this command explicitly takes around the same time.The reason for pkg-config to take so long is another matter: In this case there is a library installed with a .pc file that seems to introduce some circular dependencies (see https://github.com/grpc/grpc/issues/29137) which results in much longer waiting times for each library that depends on that "malfunctioning" library, adding up to take around 20 minutes on my system.
It seems to me one should rethink the approach of using
pkg-config --list-all
to retrieve ALL system library versions beforehand. Just one misconfigured system library will hinder cabal to do its job properly, resulting in this absurd situation that cabal requires a fix in an unrelated project (grpc) to work properly.