smarco / gem3-mapper

GEM-Mapper v3
GNU General Public License v3.0
56 stars 17 forks source link

Adding gem3-mapper to bioconda #9

Closed karl616 closed 6 years ago

karl616 commented 6 years ago

Hi, I find gem3-mapper a very nice tool and would like to have access to it in bioconda. I have set up an recipe that works, but before I push it through I would be happy if I could have some feedback. And in order to keep it alive it would need a little bit of investment from your side.

This is the recipe: https://github.com/karl616/bioconda-recipes/commit/c6a02543b1be99bc6b05ed3b4e48e777403ac609

I'm quite sure that this works or at least isn't too far off, but there are a couple of things I would like improve/confirm before I go ahead.

  1. Bioconda recommends pointing to git releases rather than fixed commits, hence I went ahead with the latest candidate (v3.6). One problem I had here was how CUDA was handled. I know you have fixed this already why I don't think this problem will persist. As I cannot guarantee that all users have CUDA-compatible hardware I decided to disable it. This came with a problem in gpu_config.h and I decided to include a snippet from HEAD (https://github.com/karl616/bioconda-recipes/blob/c6a02543b1be99bc6b05ed3b4e48e777403ac609/recipes/gem3-mapper/build.sh#L12-L24). Without full understanding of gem3-mapper I would ask if this is correct?
  2. The second problem comes with submodules as they aren't included in the release archive. In the Makefile you handle a missing submodule by pulling it from the repository. The problem is that the archive isn't a repository and the build fails. My solution here is to create a complete archive (see #8). The second step where I don't know a way around manual work is to create and include such an archive for each release candidate. For testing purposes I did this on my fork (https://github.com/karl616/gem3-mapper/releases/tag/v3.6). The recipe is currently pointing to this, but it would be nicer to point directly to your repository. This is would be your current "investment".

I think the former issue is easily solved by the next release candidate and with the inclusion of #8, the latter is done in 15min/release.

What do you think?

achacond commented 6 years ago

Hi @karl616, Thanks for all your support, very appreciated all your feedback. GEM3 autodetects if you have all the CUDA SDK installation and checks all the requirements regarding GPU. If the configure detects that you are not able to compile and run GPU code, it generates a CPU-only version. The point (1) is fully covered natively by the application, you don't need to do nothing specific. Best, Alex

smarco commented 6 years ago

Hi @karl616,

What @achacond points out is correct, however not for the v3.6 tagged version. Some patches were pushed as to handle possible cases where some submodule was missing, the hardware was not compatible, etc.

I've pushed another tag (v3.6.1) with all the latest commits (also it was about time to do this). Can you try the process again, but now against this new tag? It should do the trick. Then, if you need something else on top of this, don't hesitate and let me know.

Cheers,

karl616 commented 6 years ago

Hi @achacond hi @smarco, you are welcome. I hope to benefit as well... :)

With regard to CUDA, it might well be that I made a mistake. Starting out I had problems similar to #5. I'm a bit unsure at the moment as I wasn't able to repeat my complications from yesterday. It might have been the missing submodule that played double tricks on me.

I will definitely, but that has to wait until tomorrow.

Thanks for the support.

karl616 commented 6 years ago

Hi @smarco,

the new release worked nicely. Thanks!

I made a pull request to bioconda: https://github.com/bioconda/bioconda-recipes/pull/9937

They are asking for references and I'm citing the 2012 paper: https://www.nature.com/articles/nmeth.2221 and your entry on biotools (https://bio.tools/gemmapper). Is this correct? The latter could do with an update though. It is still pointing to sourceforge... :)

smarco commented 6 years ago

Hi @karl616,

Thanks again for the effort to push gem3 into bioconda. Note that this reference https://bio.tools/gemmapper is quite old and we should try to use https://bio.tools/GEM_Mapper if possible.

Thanks,

karl616 commented 6 years ago

Hi @smarco,

I agree, I will update that tomorrow. And you are welcome!

On Thu, Jul 19, 2018 at 10:22 PM Santiago Marco-Sola < notifications@github.com> wrote:

Hi @karl616 https://github.com/karl616,

Thanks again for the effort to push gem3 into bioconda. Note that this reference https://bio.tools/gemmapper is quite old and we should try to use https://bio.tools/GEM_Mapper if possible.

Thanks,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/smarco/gem3-mapper/issues/9#issuecomment-406402280, or mute the thread https://github.com/notifications/unsubscribe-auth/ADsaNZbHLSL-FdJUZj2LZrGXVBQuVcH2ks5uIOptgaJpZM4VQz6q .

karl616 commented 6 years ago

gem3-mapper is now a part of bioconda :)

With the bioconda repository activated it can be installed with: conda install gem3-mapper I'll close this issue. Thanks for the help.

karl616 commented 6 years ago

Hi @smarco, I have a problem with the conda installed version and I suspect it has to do with missing dependencies. Both gem-mapper and gem-indexer crashes. This is the error I get from gem-indexer:

2018/7/23 17:48:30 -- [Inspecting MultiFASTA]
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>> GEM.System.Error::Signal raised (no=4) [errno=0,Success]
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

I can only trace it to the general error handling function, but not beyond this. Does it tell you something?

Best, Karl

smarco commented 6 years ago

Can you give me the complete command line?

El lun., 23 jul. 2018 17:55, Karl Nordström notifications@github.com escribió:

Hi @smarco https://github.com/smarco, I have a problem with the conda installed version and I suspect it has to do with missing dependencies. Both gem-mapper and gem-indexer crashes. This is the error I get from gem-indexer:

2018/7/23 17:48:30 -- [Inspecting MultiFASTA] <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

GEM.System.Error::Signal raised (no=4) [errno=0,Success] <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

I can only trace it to the general error handling function, but not beyond this. Does it tell you something?

Best, Karl

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/smarco/gem3-mapper/issues/9#issuecomment-407107780, or mute the thread https://github.com/notifications/unsubscribe-auth/ACdvbZ6MIBT8MKmYl8aaoowfYaBIK37iks5uJfHngaJpZM4VQz6q .

karl616 commented 6 years ago

of course: gem-indexer -i genome.fa -o genome.gem (-b) (-t 1)

the ones within parentheses I tried to toggle to see if it had an impact. Is there something I can do to get more information?

On Mon, Jul 23, 2018 at 6:06 PM Santiago Marco-Sola < notifications@github.com> wrote:

Can you give me the complete command line?

El lun., 23 jul. 2018 17:55, Karl Nordström notifications@github.com escribió:

Hi @smarco https://github.com/smarco, I have a problem with the conda installed version and I suspect it has to do with missing dependencies. Both gem-mapper and gem-indexer crashes. This is the error I get from gem-indexer:

2018/7/23 17:48:30 -- [Inspecting MultiFASTA]

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

GEM.System.Error::Signal raised (no=4) [errno=0,Success]

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

I can only trace it to the general error handling function, but not beyond this. Does it tell you something?

Best, Karl

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/smarco/gem3-mapper/issues/9#issuecomment-407107780, or mute the thread < https://github.com/notifications/unsubscribe-auth/ACdvbZ6MIBT8MKmYl8aaoowfYaBIK37iks5uJfHngaJpZM4VQz6q

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/smarco/gem3-mapper/issues/9#issuecomment-407111834, or mute the thread https://github.com/notifications/unsubscribe-auth/ADsaNVN9l3Lh-uemiTgqtk_nGR0Ngx2Uks5uJfR2gaJpZM4VQz6q .

karl616 commented 6 years ago

It should happen somewhere here, shouldn't it?

https://github.com/smarco/gem3-mapper/blob/12f000aa9e8ec57da6f088c57ec2b136b0ad5e9f/src/archive/builder/archive_builder_text.c#L341-L371

I have also found a way to reproduce it locally, or at least in a docker image, without having to rely on the bioconda build process...

My current suspicion is that the linking is bad and a wild guess is that it has something to do with libgomp. I'll see if I can get closer

smarco commented 6 years ago

Most likely is an error with the genome.fa format. Can you compile in debug mode and let me know the output?

make debug gem-indexer -i genome.fa -o genome.gem

Cheers,

On Tue, Jul 24, 2018 at 12:30 PM, Karl Nordström notifications@github.com wrote:

It should happen somewhere here, shouldn't it?

https://github.com/smarco/gem3-mapper/blob/12f000aa9e8ec57da6f088c57ec2b1 36b0ad5e9f/src/archive/builder/archive_builder_text.c#L341-L371

I have also found a way to reproduce it locally, or at least in a docker image, without having to rely on the bioconda build process...

My current suspicion is that the linking is bad and a wild guess is that it has something to do with libgomp. I'll see if I can get closer

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/smarco/gem3-mapper/issues/9#issuecomment-407359664, or mute the thread https://github.com/notifications/unsubscribe-auth/ACdvbXIj6iGPn_w1gLurKqw6oDBJxh_fks5uJvdagaJpZM4VQz6q .

--

Santiago Marco-Sola

karl616 commented 6 years ago

I'm not sure. It only happens when I build gem3-mapper through conda. If I build it on on my own computer it works as intended.

looking at the broken gem-indexer binary with ldd I get this:

# ldd $(which gem-indexer )
    /lib64/ld-linux-x86-64.so.2 (0x7f1cf2f0e000)
    libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x7f1cf2f0e000)
    libm.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7f1cf2f0e000)
    librt.so.1 => /lib64/ld-linux-x86-64.so.2 (0x7f1cf2f0e000)
    libz.so.1 => /usr/local/bin/../lib/libz.so.1 (0x7f1cf2cf1000)
    libbz2.so.1.0 => /usr/local/bin/../lib/libbz2.so.1.0 (0x7f1cf2ae1000)
    libgomp.so.1 => /usr/local/bin/../lib/libgomp.so.1 (0x7f1cf28be000)
    libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7f1cf2f0e000)
    libdl.so.2 => /lib64/ld-linux-x86-64.so.2 (0x7f1cf2f0e000)
Error relocating /usr/local/bin/../lib/libgomp.so.1: pthread_attr_setaffinity_np: symbol not found
Error relocating /usr/local/bin/gem-indexer: backtrace_symbols_fd: symbol not found
Error relocating /usr/local/bin/gem-indexer: backtrace: symbol not found

The first error is why I think libgomp is the problem.

smarco commented 6 years ago

I've tried using the build through conda and it worked in my case:

> gem-indexer -i ../data/chr1.fa -o chr1
2018/7/24 21:33:26 -- [Inspecting MultiFASTA]
2018/7/24 21:33:28 --  100% ... done [2.382 s]
2018/7/24 21:33:28 -- Inspected text 498501247 characters (index_complement=yes). Requesting 475 MB (encoded text)
2018/7/24 21:33:28 -- [Reading MultiFASTA]
2018/7/24 21:33:30 --  100000000 bases parsed
2018/7/24 21:33:31 --  200000000 bases parsed
2018/7/24 21:33:32 -- Total 254235634 bases parsed ...done [3.222 s]
2018/7/24 21:33:32 -- [Generating Text (explicit Reverse-Complement)]
2018/7/24 21:33:32 --  100% ... done [0.535 s]
2018/7/24 21:33:32 -- [Generating BWT Forward-Text]
2018/7/24 21:33:32 -- [Building-BWT::Counting K-mers]
2018/7/24 21:33:34 --  100% ... done [1.457 s]
2018/7/24 21:33:34 -- [Building-BWT::Generating SA-Positions]
2018/7/24 21:33:34 --    2% 

In my case, I'm not missing any library:

> ldd $(which gem-indexer)
    linux-vdso.so.1 =>  (0x00007ffd3d9da000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f946c6ab000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f946c3a2000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f946c19a000)
    libz.so.1 => /home/smarco/miniconda2/envs/ddocent_env/bin/../lib/libz.so.1 (0x00007f946bf7d000)
    libbz2.so.1.0 => /home/smarco/miniconda2/envs/ddocent_env/bin/../lib/libbz2.so.1.0 (0x00007f946bd6d000)
    libgomp.so.1 => /home/smarco/miniconda2/envs/ddocent_env/bin/../lib/libgomp.so.1 (0x00007f946bb4a000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f946b780000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f946c8c8000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f946b57c000)

In any case, current version of gem doesn't rely on the openMP library nor it's using it. It's being linked by mistake (as it was used in older versions) and the linking remains there for no reason. I'll remove it in the next push.

Besides, the part of the code were you are looking is just parsing. Thus I believe the error should not be related to the OMP lib. Can you give me access to the input .fa file and maybe the broken binary?

Thanks,

karl616 commented 6 years ago

It's good it works for you. The last thing I did yesterday was to remove openmp from the dependencies. Here is a copy of the binary:

gem-indexer.gz

And as for sequence, I'm able to replicate it with something like this:

echo -e ">seq\nATATAGGGTATAGATA" > test.fa
gem-indexer -i test.fa -o test

Compiled locally the behavior is how it should, but with the bioconda version it fails.

smarco commented 6 years ago

I've tried your binary and input.

> ./gem-indexer -i test.fa -o test
2018/7/25 16:51:27 -- [Inspecting MultiFASTA]
2018/7/25 16:51:27 --  100% ... done [0.000 s]
2018/7/25 16:51:27 -- Inspected text 37 characters (index_complement=yes). Requesting 0 MB (encoded text)
2018/7/25 16:51:27 -- [Reading MultiFASTA]
2018/7/25 16:51:27 -- Total 17 bases parsed ...done [0.000 s]
2018/7/25 16:51:27 -- [Generating Text (explicit Reverse-Complement)]
2018/7/25 16:51:27 --  100% ... done [0.000 s]
2018/7/25 16:51:27 -- [Generating BWT Forward-Text]
2018/7/25 16:51:27 -- [Building-BWT::Counting K-mers]
2018/7/25 16:51:28 --  100% ... done [0.179 s]
2018/7/25 16:51:28 -- [Building-BWT::Generating SA-Positions]
2018/7/25 16:51:28 --  100% ... done [0.000 s]
2018/7/25 16:51:28 -- [Building-BWT::Sorting SA]

Can you give me more information about your system specs (both SO and hardware)?

karl616 commented 6 years ago

Yes it has to be something with my system(s). I have four systems were I have tried it, three that fails and common to all of them is that they are a bit older. I had a discussion about gemBS and the -march-native flag was mentioned. These are my testing systems:

system1: Intel Xeon E5-2667 with CentOS 6.7 (fail) system2: Intel Core i5-3570K with Fedora 28 (fail) system3: Intex Xeon E5-2670 with Debian 7.11 (fail) system4: Intel Core Skylake (cloud) with CentOS 7.5.1804 (works)

Is that enough info?

I double-checked the checksum of the binary... it is the same on all systems.

If it has to do with the hardware, that explains why it works when I compiled it locally...

heathsc commented 6 years ago

If you use -march-native there can be problems (e.g., illegal instructions) if you compile on one processor and run on another (particularly if you compile on a newer or otherwise more capable processor). If this is a possibility then using -O3 rather than -march-native is recommended.

Simon

On Wed, 25 Jul 2018, 16:44 Karl Nordström, notifications@github.com wrote:

Yes it has to be something with my system(s). I have four systems were I have tried it, three that fails and common to all of them is that they are a bit older. I had a discussion about gemBS and the -march-native flag was mentioned. These are my testing systems:

system1: Intel Xeon E5-2667 with CentOS 6.7 (fail) system2: Intel Core i5-3570K with Fedora 28 (fail) system3: Intex Xeon E5-2670 with Debian 7.11 (fail) system4: Intel Core Skylake (cloud) with CentOS 7.5.1804 (works)

Is that enough info?

I double-checked the checksum of the binary... it is the same on all systems.

If it has to do with the hardware, that explains why it works when I compiled it locally...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/smarco/gem3-mapper/issues/9#issuecomment-407800647, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHPd6kcTUmv8RlAMgUMA3cAdIgLtrh4ks5uKJJTgaJpZM4VQz6q .

karl616 commented 6 years ago

This fits well to what I see. strace comes up with a SIGILL. My initial attempt was to change -march=native to '-march=x86-64 -mtune=generic' But then I should also change -Ofast to -O3?

heathsc commented 6 years ago

I would use -O3 in this case.

Simon

On Wed, 25 Jul 2018, 17:12 Karl Nordström, notifications@github.com wrote:

This fits well to what I see. strace comes up with a SIGILL. My initial attempt was to change -march=native to '-march=x86-64 -mtune=generic' But then I should also change -Ofast to -O3?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/smarco/gem3-mapper/issues/9#issuecomment-407810279, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHPd3hzht3K7wPZUqZ9nXUmhPTHlgjZks5uKJjjgaJpZM4VQz6q .

karl616 commented 6 years ago

Then I change that as well. This was it, the conda installation now works on my system as well... and I have learned to think of them as old.

karl616 commented 6 years ago

OK, gem3-mapper is installed and works on my system. I'll close this issue again.