heathsc / gemBS

gemBS is a bioinformatics pipeline designed for high throughput analysis of DNA methylation from Whole Genome Bisulfite Sequencing data (WGBS).
GNU General Public License v3.0
32 stars 21 forks source link

gemBS 3.5.1 issue at map stage of sample data #88

Closed JakeLehle closed 2 years ago

JakeLehle commented 2 years ago

Hello @heathsc,

I just pushed the bioconda version of gemBS up to 3.5.1 and I was testing out the installation using some of my own data and I hit an issue at the gemBS map step in the pipeline. I thought I might have messed up either my csv or conf file so I jumped over and tried it out on the sample data from the worked example and got the same issue.

Here is a screenshot of the sterr output gemBS_testing_error

I know that if you build the pipeline from the repository currently the version is 3.5.5 and this version does not have an issue with running the sample data. If you can update the releases on this repository to 3.5.5, I can push it over to bioconda immediately.

Let me know if you need anything from me or if I can help in any way.

Best, Jake

JakeLehle commented 2 years ago

Hey @heathsc, @karl616, @MarcosFernandez just checking back in on this. I know I'm bugging you guys with this but I really wanna keep making progress with this project and not lose momentum.

Could any of you guys follow these steps and update the release of this package to v3.5.5_IHEC this should fix the bug I'm having and I can bump the version up to 3.5.5 on bioconda.

https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository

Thanks!

JakeLehle commented 2 years ago

Hey, I figure you guys are super busy. I have a workaround for this issue for the time being so this can be updated when you get time. I'm gonna point the url in the anaconda meta.yaml to my forked gemBS repository. I archived the 3.5.5 version so this will fix this issue I'm having with the map function in the meantime and make sure the anaconda gemBS package is functional in the meantime.

Please let me know as soon as you release the 3.5.5_IHEC version. After that, I'll update the meta.yaml file again and put in a pull request to change the meta.yaml url back.

heathsc commented 2 years ago

It looks like you are pulling in the master branch of gem3 from github and not the gembs branch (which has the bisulfite options).

Simon

On Tue, Jan 18, 2022 at 3:04 PM Jake Lehle @.***> wrote:

Hello @heathsc https://github.com/heathsc,

I just pushed the bioconda version of gemBS up to 3.5.1 and I was testing out the installation using some of my own data and I hit an issue at the gemBS map step in the pipeline. I thought I might have messed up either my csv or conf file so I jumped over and tried it out on the sample data from the worked example http://statgen.cnag.cat/gemBS/v3/UserGuide/_build/html/example.htmland got the same issue.

Here is a screenshot of the sterr output [image: gemBS_testing_error] https://user-images.githubusercontent.com/84940857/149951332-a5bb19a7-db72-417d-8bfd-48f507458998.png

I know that if you build the pipeline from the repository currently the version is 3.5.5 and this version does not have an issue with running the sample data. If you can update the releases on this repository to 3.5.5, I can push it over to bioconda immediately.

Let me know if you need anything from me or if I can help in any way.

Best, Jake

— Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS/issues/88, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY4652QPQEON6IFEC4NVXTUWVXPVANCNFSM5MHIIJ2A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

JakeLehle commented 2 years ago

Hey @heathsc. Thanks for getting back to me. You are talking about my error code right? Okay, I'll take a look at it that's weird. Would the gem3 package recognize the gemBS function though?

JakeLehle commented 2 years ago

Okay, that was unexpected. I was able to patch the gemBS version on bioconda up to 3.5.5 as you can see the pull request was merged yesterday. I made this request from my forked version of the current gemBS repository so this should include all of the changes made in the past 2 or so years or after 3.5.1 was released.

I downloaded the new 3.5.5 version from anaconda this morning and was testing it out and I'm still having the same error code as before. I thought more about your last comment @heathsc about how I might be pulling from gem3 branch and not the gemBS branch which would make sense with the error output. "gem-mapper: unrecognized option '--bisulfite-conversion' "

Originally, I thought if I could get all the files current with the repository in the 3.5.5 tarball that should fix the issue during the build. But seeing the issue is still there is now making me go back to what you said.

I'll go back and check the patch file and see if I can't pick out where the issue is coming from and make sure it's pulling from the right source. If it isn't, then that should be a simple fix and I can just do another pull request, modify the patch file and increment the build number to the meta.yaml file to make the change.

I'll figure it out, I just need coffee and time. If/when I get this figured out please consider making me a maintainer on this repository so I can do all the same changes here that I have done on my forked version. I think this pipeline is so cool and I would be happy to help keep it updated in the future.

JakeLehle commented 2 years ago

Update on this issue.

Okay, I'm learning more, but the Anaconda gemBS package is still broken. I thought about the way the repo is set up and how the bs_call, gem3-mapper, and gem-cutter repos files don't actually live in the gemBS repository but are linked to them through submodules which is obviously why you have to recursively get them when you clone the repository to your machine.

I saw that git does not currently support the compression of submodule linked files when a new release is put out. So I thought Oh this might be why there are issues with the anaconda building the package from the gemBS tarball because it's missing most of the files. that don't get compressed with the release version. So I added all of these files to my v3.5.5-IHEC repo and then compressed the files again to add them to my release.

I updated the Anaconda build.sh to make the installation from the makefile in the tools dir just like I do when I download the files from this master repository on my personal computer and I hit a bunch of snags dealing with anaconda glibc not being >2.17. Made a workaround I thought was pretty clever using an idea I got from another build.sh added a few minor patches after that and the installation passed all the checks.

I should be golden right? Wrong. I broke samtools. So now the installation is stalling out at the gemBS index step. Annoying, but I broke this more than it already was so I'm gonna fix this.

Any tips on how the samtools is getting pulled into the instllation? I'm gonna use today to dig through the python scripts or look for any wget commands that might be messing me up.

heathsc commented 2 years ago

Sorry you're still battling with this. I've never used Anaconda so I can't help too much. In terms of the building of samtools/bcftools etc. this is all handled by tools/Makefile. No python scripts involved. If the source directories do not exist (as is the case in a fresh install) the source tar files are downloaded, unpacked and built.

Simon

On Mon, Jan 31, 2022 at 5:42 PM Jake Lehle @.***> wrote:

Update on this issue.

Okay, I'm learning more, but the Anaconda gemBS package is still broken. I thought about the way the repo is set up and how the bs_call, gem3-mapper, and gem-cutter repos files don't actually live in the gemBS repository but are linked to them through submodules which is obviously why you have to recursively get them when you clone the repository to your machine.

I saw that git does not currently support the compression of submodule linked files when a new release is put out. So I thought Oh this might be why there are issues with the anaconda building the package from the gemBS tarball because it's missing most of the files. that don't get compressed with the release version. So I added all of these files to my v3.5.5-IHEC repo https://github.com/JakeLehle/gemBS/pull/3 and then compressed the files again to add them to my release.

I updated the Anaconda build.sh https://github.com/bioconda/bioconda-recipes/blob/master/recipes/gembs/build.sh to make the installation from the makefile in the tools dir just like I do when I download the files from this master repository on my personal computer and I hit a bunch of snags dealing with anaconda glibc not being

2.17 https://github.com/conda-forge/tensorflow-feedstock/issues/67. Made a workaround I thought was pretty clever using an idea I got from another build.sh https://github.com/bioconda/bioconda-recipes/blob/25ee21573c577aa1ae899e0f7fbbc848d8a63865/recipes/dawg/build.sh added a few minor patches after that and the installation passed all the checks.

I should be golden right? Wrong. I broke samtools. So now the installation is stalling out at the gemBS index step. Annoying, but I broke this more than it already was so I'm gonna fix this.

Any tips on how the samtools is getting pulled into the instllation? I'm gonna use today to dig through the python scripts or look for any wget commands that might be messing me up.

— Reply to this email directly, view it on GitHub https://github.com/heathsc/gemBS/issues/88#issuecomment-1025980854, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY4652PMMFASIHFO5TF7RLUY23ZFANCNFSM5MHIIJ2A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

JakeLehle commented 2 years ago

Don't worry, it's a fun puzzle. Breaking code always helps to better understand it. At this point, I think I'm hopefully running out of things to break, so I must be close to figuring it all out haha. I'm just amazed at all the different parts you were able to put together in this repository. Okay, thanks for the tip! I have another hunch of what to do that I'm testing out now. I'll let you know if I get it to work.

Best, Jake

JakeLehle commented 2 years ago

Haha fixed one thing and broke another but, progress is progress. Okay, I went back looked over the setup.py file and the build.sh I use to make the package on anaconda. I saw previously when I was getting some help from @dpryan79 we had set this up so that the last line would install the file with the --minimal option which is super common in these build scripts BUT if you look at the python file the minimal build removes bs_call and the gem3-mapper so I think that was what was causing this issue I was initially having in this thread here.

I changed that python3 setup.py install line to have no flags other than what is required during linting for setup tools not to break and was testing out this morning and was happy to see the gemBS prepare, index, and map steps were all working.

Yay we can make bam files in lighting speed!

Okay now for the bad news. The pipeline starts throwing errors at the gemBS call step but will run through the step at least and then crashes at the gemBS extract step due to missing BCF files. In addition, it looks like there are some libs that don't get wrapped up in the package which is frustrating. The missing libs are libcblas.so.3 which needs to be put in the /home/user/anaconda3/bin/bs_call directory and the libcrypto.so.1.0.0 which needs to be put in /home/user/python3.8/site-packages/gemBS/bin/bcftools directory. I found those on my computer from other pipelines and copied them over to see if that would push this through but it doesn't. Even weirder, the version of gemBS is back at 3.0.0 this din't happen with any of the other builds they all said 3.5.5 so it has to be something with modifying the python stepup.py install step.

Okay based on everything I've seen so far I think these issues are coming from the python setup.py install step. I'm gonna leave the bioconda package alone for the moment and play around on my local computer with building everything just using that master Makefile in the tools dir. Once I'm sure it should work I'll take another crack at setting this all up and hopefully, this will be the final solution.

JakeLehle commented 2 years ago

**Fixed it!**

I am very happy (and slightly relieved) to report that the anaconda installation of gemBS is currently stable at v3.5.5_IHEC and runs all the way through the worked example data. What was broken has been mended. The final issue was indeed coming from the python installation using a minimal install which removed the bs_call and the gem3-mapper sections of the pipeline.

Okay, I'm gonna close this issue with this comment. Currently, the anaconda installation is pointed at a tarball download from my forked version of the gemBS repo. I am more than happy to help continue to maintain the gemBS anaconda builds in the future but I have one request. Could you make a branch in the gemBS repo called anaconda?

After that, I can open a pull request for a new anaconda branch of the gemBS repo so that the v3.5.5_IHEC release can be made from that new branch just like I did and That preserves the master branch from any changes and I can point the installation in the meta.yaml back at @heathsc gemBS repo instead of mine.

Best, Jake