Closed mbhall88 closed 3 years ago
Hello,
I have a favour to ask you: I have been always postponing improving my knowledge on infrastructure tasks like this. For such things, I think actually doing it is the best way to learn. I am wondering if it would be fine for you to let me do this, and then I do a draft PR, with a few review requests from you. I will read the official docs, but I think you have been creating these recipes for some time now, and might have some tricks and best practices of your own, so I would enjoy also to take a look at some 2 or 3 of your recipes. OFC, feel free to decide that you want to do this yourself, then I can see your recipe later.
Of course; be my guest. This is probably a good recipe to get thrown in the deep end with too :sweat_smile: Happy to pair-program in the beginning too to get you "up to speed" if you want?
Oh, pair programming in the beginning would be amazing! I am taking more care of my sleeping schedule now though, so doing it 11 PM Cam / 9 AM Sunshine is a bit hard to me. It would be better for me 10 PM Cam / 8 AM Sunshine or 8 AM Cam / 6 PM Sunshine. Any time past 8 AM Cam is also fine. Is any of these time slots ok for you? PS: If yes, I think we can set it for a day next week?
I have opened a PR (https://github.com/bioconda/bioconda-recipes/pull/27388) and asked for help as I am having issues with (surprise) Boost.
So annoyingly it seems like Hunter actually makes it harder to make a bioconda recipe. Maybe you could take a look at this when you get a chance @leoisl as you're the hunter expert here.
Will take a look, this is surprising, I haven't bumped into this issue before.
Trivial question, I think the answer is negative, but it is worth trying anyway: could the precompiled binary make the conda recipe easier? The only requirement to run it is OpenMP 4.0+, which comes with GCC 4.9+. So we would install this GCC version or a later one, and just download the binary and make it executable. Or should we necessarily compile from source? I guess a downside is that the binary just works on Linux, and not on Mac OS, but I have no idea if the current code compiles and runs fine on Mac OS (we don't have Mac OS in our CI, so maybe fine to ignore it)?
could the precompiled binary make the conda recipe easier?
No. It needs to be built from source. I can't find anything in the docs that explicitly states this, but I suspect it is to do the fact that conda uses some very customer build systems with custom compilers and this might cause some problems? I dunno.
I am working on this, and will log some issues here. Bioconda does not like the -
character in software version, so when we say the version of pandora
is 0.9.0-rc1
, it errors out with:
Error: bad character '-' in package/version: 0.9.0-rc1
I changed the bioconda version to be instead 0.9.0_rc1
, but the github version is still 0.9.0-rc1
. Is it fine to leave like this, or do we also change the github version to 0.9.0_rc1
?
I don't think a bioconda package for the RC is a good idea. Prereleases aren't supported by bioconda yet so we would have to do this on a private channel (see here and here)... The primary problem is users will get the RC as their latest version as there isn't a mechanism for specifying you do/don't want prereleases like in PyPI.
I think we just get the recipe working for v0.8.0 first and then worry about trying to do it for the RC after.
Hello, after some 3 days of work exclusively on this, I managed to build pandora on a conda local container (what is called the bootstrap method). This is the PR associated with this work: https://github.com/rmcolq/pandora/pull/275 . The bootstrap method clones the docker containers in conda's Circle CI to build the conda package locally. In other words, if it works with the bootstrap method, which does, it should work on their Circle CI build. We do manage to compile pandora
on conda's Circle CI, but 25 out of 795 tests fail, all related with denovo. I think the issue might be the files that GATB
creates, but GATB
does not create any special files in temp
or anything, it just uses the workdir.
Very simple tests like this fails:
TEST(QueryAbundance, oneKmer_ReturnOne)
{
Graph graph = Graph::create(new BankStrings("AATGT", NULL),
"-kmer-size 5 -abundance-min 1 -verbose 0 -out gatb_graph_test");
auto kmer = "AATGT";
auto node { graph.buildNode(kmer) };
const auto covg { graph.queryAbundance(node) };
// We get the neighbors of this real node and make sure it has the neighbours we
// expect
EXPECT_EQ(covg, 1);
remove_graph_file("gatb_graph_test");
}
As an example, these are the files that GATB
creates in this test (nothing out of ordinary or special, just files in the workdir):
close(3</sys/devices/system/cpu>) = 0
close(3</home/leandro/git/pandora/cmake-build-debug-coverage>) = 0
close(3</proc/cpuinfo>) = 0
close(4</proc/cpuinfo>) = 0
close(4</proc/cpuinfo>) = 0
close(4</proc/cpuinfo>) = 0
close(4</home/leandro/git/pandora/cmake-build-debug-coverage/trashme_43805_superK_partitions/superKparts.0>) = 0
close(5</home/leandro/git/pandora/cmake-build-debug-coverage/trashme_43805_superK_partitions/superKparts.1>) = 0
close(6</home/leandro/git/pandora/cmake-build-debug-coverage/trashme_43805_superK_partitions/superKparts.2>) = 0
close(7</home/leandro/git/pandora/cmake-build-debug-coverage/trashme_43805_superK_partitions/superKparts.3>) = 0
close(8</home/leandro/git/pandora/cmake-build-debug-coverage/trashme_43805_superK_partitions/superKparts.4>) = 0
close(9</home/leandro/git/pandora/cmake-build-debug-coverage/trashme_43805_superK_partitions/superKparts.5>) = 0
close(10</home/leandro/git/pandora/cmake-build-debug-coverage/trashme_43805_superK_partitions/superKparts.6>) = 0
close(11</home/leandro/git/pandora/cmake-build-debug-coverage/trashme_43805_superK_partitions/superKparts.7>) = 0
close(4</proc/cpuinfo>) = 0
close(4</proc/cpuinfo>) = 0
close(4</proc/cpuinfo>) = 0
close(4</home/leandro/git/pandora/cmake-build-debug-coverage/trashme_43805_cfp>) = 0
close(4</home/leandro/git/pandora/cmake-build-debug-coverage/trashme_43805_cfp>) = 0
close(6</home/leandro/git/pandora/cmake-build-debug-coverage/trashme_43805_cfp>) = 0
close(5</home/leandro/git/pandora/cmake-build-debug-coverage/trashme_43805_debloom_partitions.h5 (deleted)>) = 0
close(4</home/leandro/git/pandora/cmake-build-debug-coverage/trashme_43805_cfp (deleted)>) = 0
close(4</proc/cpuinfo>) = 0
I have absolutely no idea how to solve this
Wrote an issue to their github, will wait for their answer
Wrote an issue to their github, will wait for their answer
Can you please link to the issue you raised?
Oops, yeah, sorry: https://github.com/bioconda/bioconda-recipes/issues/28102
Removing the tests worked fine for linux, but failed for mac os: https://github.com/bioconda/bioconda-recipes/pull/27768/checks?check_run_id=2396324449 . Will debug the mac os build and make it work. Then we will have executable where tests don't pass, I will run it on the 4-way pipeline and check if results look similar. If not, then I don't think we should make a conda version until we can make sure the tests pass on the executable
I removed the tests, so linux build works, but I am stuck on the Mac OS build now. Basically, clang-11
(which we have to forcibly use in the conda build environment, we can't specify another compiler or version) errors out when building Boost
with this:
clang-11: error: unknown argument: '-fcoalesce-templates'
Basically, this argument does not exist for newer versions of clang
, because for whatever reason they did not want to preserve backwards compatibility. Boost
has a check for this (i.e. it just uses this argument if clang
version is < 4.0.0
, but the fix has a bug, it makes a lexicographical comparison, and in this case, 11.0.0 < 4.0.0
, and then it tries to use this option that does not exist in clang 11
. This bug and its fix is described here. It was fixed on Boost 1.73
, but for pandora
, we need to forcibly use Boost 1.62.0
, so the fix is not yet there. As we also install Boost
using Hunter
, we can't add the patch ourselves for the fix... So we have a bug, we know how to fix, but we are using inflexible tools. On one hand, bioconda build environment just accepts one compiler and at a fixed version, and on the other hand, we can't apply patches to Hunter
builds.
Will try to fix this in the next days, but will take a couple of days of break of trying to build conda recipe
Log for self, this might fix it: https://github.com/alisw/alidist/pull/2159/files
So, I simplified even more stuff. I am skipping the recipe build for osx
, i.e. we will just have conda recipe for linux-64
. I don't even know if pandora
compiles from source and runs fine in osx
anyway. I know that right now, if pandora
even works fine on osx
, it is single threaded (it seems we need to do some more stuff for OpenMP
to work on osx
). I also removed the tests, and the linux-64
recipe now builds: https://github.com/bioconda/bioconda-recipes/pull/27768#issuecomment-833665968 . I will test if this recipe is running fine, as the tests not all pass.
A side question: do we proceed, or do we necessarily need to support osx
? I personally don't think is necessary osx
support, as I view pandora
as a tool to be run on clusters, which are basically linux
in bioinformatics.
It's ok for now, but we do need to figure out how to get it to run on Mac. As an example of why this is important, I am developing a tool to do AMR prediction using pandora
. Some of the people likely to want to run this will most certainly be using Mac. So my tool is currently limited to use on Linux.
Agree with both. In particular, I think initially supporting Linux only is fine/acceptable
In the worst case, at least for now, your tool can still be run on Mac, as we have containers for pandora
, although it would be a bit messy... instead of calling pandora
directly, we can execute it through the singularity container. Far from the ideal solution, and would require Mac users to have singularity
installed, but it seems fine for a temporary solution...
If it was readily available, would it be an option for you? Or even if we have singularity working on osx just as fine it works on linux, would we still need pandora
to be osx-compatible? Being compatible with osx can be really a pain sometimes, I am just wondering if there is any acceptable way around it...
Well given the number of people who use mykrobe on osx, I think we really do need to try our best to support it... :cry:
As far as I can tell (no actual download stats), there are people who use the Mac app, but the vast majority use it on Linux. Windows is surprisingly common because people in healthcare use it (I was completely ignoring Windows, so anything >0 was a shock)
People sometimes use the Windows and mac apps when teaching undergrads (no way singularity an option there) . Many bioinformaticians work on macs, but I don't know how many use the app. But I've never heard anyone use mykrobe on the mac command line.
Omg this is an unhelpful non quantitative ramble. The mykrobe experience confirms supporting mac and Windows is useful. Bulk processing is always Linux.
osx
is supportable, IDK how much effort we will have to put into it, but it is still a unix system, so I can see it might work on osx
. The main issue is if one of our dependencies do not support osx
, then we will have to switch dependencies (or fix the dependency?), which might take a lot of work.
I can't say anything about Windows, none of the dependencies we use mention Windows, nor are tested under Windows, so it might work or not, most probably not. Parts of our code that multiprocess using fork
(e.g. all make_prg
, and pandora discover
) will not "work" on Windows, as Windows does not implement copy-on-write fork. It will work, but if we use n
threads, it will consume n
times the RAM it consumes on Linux or Mac.
So I can see us supporting osx
with some effort, but I don't think is worth it to support Windows.
pandora discover
and GATB
are working just fine in a test I did using the conda executable. Who knows why those 25 tests fail, I guess it won't be me who will find out, I sort of exhausted all my ideas already. It is hard also to solve when the issue is not reproducible locally with the bootstrap method...
Anyway, it seems pandora
for conda
on linux
only is working. I've been testing creating this recipe for version 0.8.0
, but I think we should support just the last version, 0.9.0
, and future ones in conda
. Plus, from version 0.9.0
onwards, we can run the example workflow that should point any issue with index, discover or compare as a second layer of tests. We can also install the two dependencies pandora
needs in the conda recipe, mafft
and make_prg
, so that all the user needs to do is conda install -c bioconda pandora
have all installed. Is it fine then to skip 0.8.0
and push a recipe only for 0.9.0
?
The 25 tests that fail are now removed manuallly in the build script when testing pandora
. The remaining 770 tests still run and all work. If we can skip 0.8.0
, then I will also test the example workflow after these 770 tests, as it is a way to test pandora discover
also. If all these tests pass, I think we can say all is fine with the binary. I hope they accept the recipe with these tests, as it literally takes 1.5 seconds to test:
22:55:44 BIOCONDA INFO (OUT) [==========] 770 tests from 114 test suites ran. (1503 ms total)
.... while it takes 1h to build...
Is it fine then to skip 0.8.0 and push a recipe only for 0.9.0?
Sure thing
There are two things that need to be done before I can start working on this:
dev
branch ontomaster
(after #243 is merged intodev
)