yupenghe / methylpy

WGBS/NOMe-seq Data Processing & Differential Methylation Analysis
Apache License 2.0
136 stars 47 forks source link

IOError: [Errno 71] Protocol error #39

Closed avellab closed 5 years ago

avellab commented 5 years ago

Hi Yupeng, I am using linux to run methylpy. I have problem with the first step building references. when I run the following script I get IOError: [Errno 71] Protocol error error:

methylpy build-reference \ --input-files wtMouseTGFBI.fa \ --output-prefix wtMouseTGFBI \ --aligner minimap2

would you please let me know what the problem is? should I run this with python in linux? or just linux? Thanks

yupenghe commented 5 years ago

Can you provide some test data to reproduce the error? Thanks.

avellab commented 5 years ago

I first ran python run_test.py to test the package, 2 of the tests failed: DMRfind and reidentify-DMR

Tests start Wed Sep 25 17:34:24 2019

Test importing methylpy module.

methylpy is successfully installed!

Check whether dependencies are available in PATH

Test DMRfind: failed Test reidentify-DMR: failed

Test build-reference with bowtie: pass Test build-reference with bowtie2: pass Test build-reference with minimap2: pass

Test single-end-pipeline with bowtie: pass Test single-end-pipeline with bowtie2: pass Test single-end-pipeline with minimap2: pass

Test paired-end-pipeline with bowtie: pass Test paired-end-pipeline with bowtie2: pass Test paired-end-pipeline with minimap2: pass

Test quality filter for BAM file of single-end data: pass

Test call-methylation-state for single-end data: pass Test call-methylation-state for paired-end data: pass

Test merge-allc: pass

Test filter-allc: pass

All tests are done! Wed Sep 25 17:46:25 2019 The tests took 720.78 seconds!

then I ran methylpy build-reference \ --input-files wtMouseTGFBI.fa \ --output-prefix wtMouseTGFBI \ --aligner minimap2

Traceback (most recent call last): File "/home/hilar/.local/bin/methylpy", line 5, in parse_args() File "/home/hilar/.local/lib/python2.7/site-packages/methylpy/parser.py", line 50, in parse_args buffsize=args.buffsize) File "/home/hilar/.local/lib/python2.7/site-packages/methylpy/call_mc_se.py", line 604, in build_ref f.close() IOError: [Errno 71] Protocol error

I used bowtie2 instead of minimap and it worked. what can I do for failed tests for DMRfind and reidentify-DMR?

Thanks

yupenghe commented 5 years ago

Thanks. What was the error message of the DMR failure? You can find it in the test_error_msg.txt and test_output_msg.txt files.

You may want to try to follow this procedure and see if the DMRfind issue can be fixed: https://github.com/yupenghe/methylpy#optional-step---compile-rmscpp

Are you fine with using bowtie2? Or you will need to use minimap2?

avellab commented 5 years ago

Thanks for your response. below are the tests results I get for failed DMR and re-identify-DMR:

test_error_msg.txt test_output_msg.txt

When I run the code below I get the following error: methylpy DMRfind \ --allc-files ResultsControlHighMe/allc_mCornea_H.tsv \ ResultsControlLowMe/allc_mCornea_L.tsv \ --samples HighMe_control LowMe_control \ --mc-type "CGN" \ --chroms wtMouseTGFBI \ --num-procs 8 \ --min-num-dms 5 \ --output-prefix Results/CG_DMR_High_Low

output error: Filtering allc files using 2 node(s). Mon Sep 30 14:52:18 2019

Splitting allc files for chromosome wtMouseTGFBI Mon Sep 30 14:52:19 2019

(<type 'exceptions.KeyError'>, 184) 'wtMouseTGFBI' Running RMS tests failed.

avellab commented 5 years ago

Also regarding your question for using minimap2 vs Bowtie2, when I use minimap2 it gives me an error saying that the format is not supported because it internally runs bowtie2 although I use --aligner minimap2 function. So when I changed the code to --aligner bowtie2 it works.

yupenghe commented 5 years ago

The error you got in running DMRfind is due to two issues:

Regarding minimap2, this aligner is being actively updated. Unfortunately, I haven't got time to update methylpy to keep up with the changes. Please do not use minimap2 for now.

avellab commented 5 years ago

I tried the executable but didnt work:

./run_rms_tests.out ./run_rms_tests.out: error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory

I then tried below: g++ -O3 -l gsl -l gslcblas -o run_rms_tests.out rms.cpp

rms.cpp:17:10: fatal error: gsl/gsl_rng.h: No such file or directory

include <gsl/gsl_rng.h>

      ^~~~~~~~~~~~~~~

compilation terminated.

yupenghe commented 5 years ago

Got it. I believe you would need to install gsl library (https://www.gnu.org/software/gsl/). If you already installed it, you may want to check if the environmental variable is set correctly: see https://stackoverflow.com/questions/22222666/error-while-loading-shared-libraries-libgsl-so-0-cannot-open-shared-object-fil

avellab commented 5 years ago

I installed the gsl library and tried methylpy DMRfind. I got the following error:

Filtering allc files using 2 node(s). Tue Oct 8 19:45:32 2019

Splitting allc files for chromosome wtMouseTGFBI Tue Oct 8 19:45:32 2019

(<type 'exceptions.KeyError'>, 184) 'wtMouseTGFBI' Running RMS tests failed.

is this because of the chromosome name "wtMouseTGFBI"? my sample is a construct made of one chromosome and that is how I labeled it in all of the allc files.

Thanks

yupenghe commented 5 years ago

Were you able to run ./run_rms_tests.out without error? Do you mind to share a subset of the data for me to reproduce the error?

avellab commented 5 years ago

this is the error I get when running the test: ./run_rms_tests.out: error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory

I am attaching my FASTA reference along with one sample for you to run the folloiwng script:

methylpy DMRfind \ --allc-files ResultsControlHighMe/allc_mCornea_H.tsv \ ResultsControlLowMe/allc_mCornea_L.tsv \ --samples HighMe_control LowMe_control \ --mc-type "CGN" \ --chroms wtMouseTGFBI \ --num-procs 8 \ --min-num-dms 5 \ --output-prefix Results/CG_DMR_High_Low

ResultsControlLowMe.xlsx ResultsControlHighMe.xlsx

wtMouseTGFBI.docx

yupenghe commented 5 years ago

It looks like that the system is unable to find the gsl library you installed. Most likely you will need to set the environmental variable LD_LIBRARY_PATH. There is a link in the previous posts. Please let me know if you have trouble fixing this.

On Wed, Oct 9, 2019 at 9:53 AM avellab notifications@github.com wrote:

this is the error I get when running the test: ./run_rms_tests.out: error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory

I am attaching my FASTA reference along with one sample for you to run the folloiwng script:

methylpy DMRfind --allc-files ResultsControlHighMe/allc_mCornea_H.tsv ResultsControlLowMe/allc_mCornea_L.tsv --samples HighMe_control LowMe_control --mc-type "CGN" --chroms wtMouseTGFBI --num-procs 8 --min-num-dms 5 --output-prefix Results/CG_DMR_High_Low

ResultsControlLowMe.xlsx https://github.com/yupenghe/methylpy/files/3708510/ResultsControlLowMe.xlsx ResultsControlHighMe.xlsx https://github.com/yupenghe/methylpy/files/3708511/ResultsControlHighMe.xlsx

wtMouseTGFBI.docx https://github.com/yupenghe/methylpy/files/3708527/wtMouseTGFBI.docx

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/yupenghe/methylpy/issues/39?email_source=notifications&email_token=ACYG6MXRSFY7FXHMRTCJUPDQNYEB5A5CNFSM4I2R24HKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAYR3TY#issuecomment-540089807, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYG6MQGBXPJE27SIVVN3ADQNYEB5ANCNFSM4I2R24HA .

-- Yupeng He Senior Bioinformatics Scientist, Guardant Health https://www.linkedin.com/in/yupeng-he-45735327/ https://github.com/yupenghe

avellab commented 5 years ago

previously, to install gsl I did the following:

cd /mnt/udrive/My_Work/softwares wget ftp://ftp.gnu.org/gnu/gsl/gsl-2.6.tar.gz tar -zxvf gsl-2.6.tar.gz cd /mnt/udrive/My_Work/softwares/gsl-2.6 ./configure --prefix=/mnt/udrive/My_Work/softwares/gsl-2.6 make make check make install

and I got the following error for all the "make" steps:

m: cannot remove 'gsl-config': Operation not permitted make[1]: [Makefile:1559: gsl-config] Error 1 make[1]: Leaving directory '/mnt/udrive/My_Work/softwares/gsl-2.6' make: [Makefile:963: check-recursive] Error 1

so probably the libraries gsl is not installed properly. Is that right?

yupenghe commented 5 years ago

Yes. It looks like that GSL library was not installed properly. Alternatively, you may want to try installing GSL using conda (https://anaconda.org/conda-forge/gsl).

avellab commented 5 years ago

I installed gsl using conda this time, but running the DMR test failed again:

Package Plan

environment location: _U:\MyWork\softwares\miniConda

added / updated specs:

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
ca-certificates-2019.9.11  |       hecc5488_0         181 KB  conda-forge
certifi-2019.9.11          |           py27_0         147 KB  conda-forge
conda-4.7.12               |           py27_0         3.0 MB  conda-forge
gsl-2.5                    |       hebfefe3_1         1.4 MB  conda-forge
libblas-3.8.0              |           13_mkl         3.5 MB  conda-forge
libcblas-3.8.0             |           13_mkl         3.5 MB  conda-forge
openssl-1.1.1c             |       h0c8e037_0         4.8 MB  conda-forge
------------------------------------------------------------
                                       Total:        16.6 MB

The following NEW packages will be INSTALLED:

gsl conda-forge/win-64::gsl-2.5-hebfefe3_1 libblas conda-forge/win-64::libblas-3.8.0-13_mkl libcblas conda-forge/win-64::libcblas-3.8.0-13_mkl

The following packages will be UPDATED:

ca-certificates pkgs/main::ca-certificates-2019.8.28-0 --> conda-forge::ca-certificates-2019.9.11-hecc5488_0

The following packages will be SUPERSEDED by a higher-priority channel:

certifi pkgs/main --> conda-forge conda pkgs/main --> conda-forge openssl pkgs/main::openssl-1.1.1d-h0c8e037_2 --> conda-forge::openssl-1.1.1c-h0c8e037_0

Proceed ([y]/n)? y

Downloading and Extracting Packages gsl-2.5 | 1.4 MB | ######################################################### | 100% openssl-1.1.1c | 4.8 MB | ######################################################### | 100% libblas-3.8.0 | 3.5 MB | ######################################################### | 100% conda-4.7.12 | 3.0 MB | ######################################################### | 100% certifi-2019.9.11 | 147 KB | ######################################################### | 100% ca-certificates-2019 | 181 KB | ######################################################### | 100% libcblas-3.8.0 | 3.5 MB | ######################################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done

_/mnt/udrive/MyWork/Bioinf/methyl-Seq/mouseProject$ methylpy DMRfind \ --allc-files ResultsControlHighMe/allc_mCornea_H.tsv \ ResultsControlLowMe/allc_mCornea_L.tsv \ --samples HighMe_control LowMe_control \ --mc-type "CGN" \ --chroms wtMouseTGFBI \ --num-procs 8 \ --min-num-dms 5 \ --output-prefix Results/CG_DMR_High_Low

Filtering allc files using 2 node(s). Fri Oct 11 13:40:38 2019

Splitting allc files for chromosome wtMouseTGFBI Fri Oct 11 13:40:38 2019

(<type 'exceptions.KeyError'>, 184) 'wtMouseTGFBI' Running RMS tests failed.

_/mnt/udrive/MyWork/Bioinf/methyl-Seq/methylpy/methylpy$ ./run_rms_tests.out ./run_rms_tests.out: error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory

_could that be the path? Now im manually trying to transfer the miniConda to where I have ./run_rms_tests.out saved (/mnt/udrive/MyWork/Bioinf/methyl-Seq/methylpy/methylpy), still doesnt work:

_mnt/udrive/My_Work/Bioinf/methyl-Seq/methylpy/methylpy$_ ls call_mc_pe.py DMRfind.py miniConda rms.cpp utilities.py call_mc_se.py init.py parser.py run_rmstests.out **/mnt/udrive/MyWork/Bioinf/methyl-Seq/methylpy/methylpy**$ ./run_rms_tests.out ./run_rms_tests.out: error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory

yupenghe commented 5 years ago

If GSL is installed successfully, the error is because the location of GSL library file is unknown to the system. The solution is to set the LD_LIBRARY_PATH variable to include the path to the library GSL library file.

Do you see libgsl.so.0 or libgsl.so.* files in folder like U:\My_Work\softwares\miniConda\lib\?

If that is the case, you should be able to fix the error for the current terminal by doing something like:

export LD_LIBRARY_PATH=U:\My_Work\softwares\miniConda\lib\:$LD_LIBRARY_PATH

Adding this line of command to your ~/.bashrc file (assuming you are using bash) and restarting the terminal should fix the problem.

You can check out this post: https://stackoverflow.com/questions/22222666/error-while-loading-shared-libraries-libgsl-so-0-cannot-open-shared-object-fil

avellab commented 5 years ago

Thanks, I opened a new terminal and here is what I did:

~/methylpy$ ls call_mc_pe.py DMRfind.py miniConda rms.cpp utilities.py call_mc_se.py init.py parser.py run_rms_tests.out

~/methylpy$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib export LD_LIBRARY_PATH ./run_rms_tests.out Error: Usage: ./rms.out

The DMR tests failed again, do you think dependencies such as "wigToBigWig" is an issue? because I am not able to install it, test_error_msg.txt test_output_msg.txt

"set up channel in miniConda" conda config --add channels defaults conda config --add channels bioconda conda config --add channels conda-forge

conda install -c bioconda ucsc-wigtobigwig conda install -c bioconda/label/cf201901 ucsc-wigtobigwig

Error: PackagesNotFoundError: The following packages are not available from current channels:

Current channels:

To search for alternate channels that may provide the conda package you're looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

yupenghe commented 5 years ago

I don't think wigtobigwig is the issue. The issue is still that the system is unable to find the gsl library file.

Did you test methylpy after running the command or before?

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

If you add the above line to your ~/.bashrc file and open up a new terminal for testing, would it help?

What is the full output when you ran ./run_rms_tests.out after running the above command?

avellab commented 5 years ago

to answer your questions: Did you test methylpy after running the command or before?

Splitting allc files for chromosome wtMouseTGFBI Tue Oct 15 14:42:35 2019

(<type 'exceptions.KeyError'>, 184) 'wtMouseTGFBI' Running RMS tests failed.

If you add the above line to your ~/.bashrc file and open up a new terminal for testing, would it help? -I am new to linux and am not familiar with writing a .bashrc file.

What is the full output when you ran ./run_rms_tests.out after running the above command? -Usage: ./rms.out

yupenghe commented 5 years ago

Thanks.

  1. Do you know if you use bash, zsh or other version of shell? If you use bash, then the terminal will automatically set up environmental variables based on ~/.bashrc file where ~ is a short cut to your home directly. For adding the above line to your ~/.bashrc file, you can do
    cd ~/
    emacs .bashrc

    You can use other text editor like vim too. If you are not familiar with command line, you can do

    cd ~/
    pwd

    and then go to the printed out folder and create/modify the .bashrc file with GUI text editor like notepad.

Then add the code to the .bashrc file (if it does not exist, please create one).

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

Last run

source .bashrc
  1. From the test error file, the methylpy you installed is using

    /home/hilar/.local/lib/python2.7/site-packages/methylpy/run_rms_tests.out

    What message would you get if you run

    /home/hilar/.local/lib/python2.7/site-packages/methylpy/run_rms_tests.out

    before and after you run the "export xxx" command?

  2. If you get an error like below from Step 2, can you try to recompile the file with

    g++ -O3 -l gsl -l gslcblas -o /home/hilar/.local/lib/python2.7/site-packages/methylpy/run_rms_tests.out
    /home/hilar/.local/lib/python2.7/site-packages/methylpy/rms.cpp
avellab commented 5 years ago

Thank you for taking time to help me with this, I do really appreciate your help. I am using GNU bash, version 5.0.2(1)-release (x86_64-pc-linux-gnu). I used emacs command, a new page opened up and I added/saved the export xxxx to the .bashrc. There were other export commands in that file (below), so I added your line at the end:

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/include export PPH=/home/hilar/Programs/polyphen-2.2.2 export PATH=$PATH:$PPH/bin export PATH=$PATH:/home/hilar/Programs/vcf2db:/home/hilar/Programs/vcf2maf:/home/hilar/Programs/vcfanno export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

then in the terminal I typed source .bashrc (nothing happened). Then ran the test again and got the same error. the test gave me the same error before adding this line as well. please see below:

hilar@kali:~$ /home/hilar/.local/lib/python2.7/site-packages/methylpy/run_rms_tests.out Usage: ./rms.out hilar@kali:~$ cd ~/ hilar@kali:~$ emacs .bashrc hilar@kali:~$ source .bashrc hilar@kali:~$ /home/hilar/.local/lib/python2.7/site-packages/methylpy/run_rms_tests.out Usage: ./rms.out

  1. recompiling the file failed as well:

hilar@kali:~$ g++ -O3 -l gsl -l gslcblas -o /home/hilar/.local/lib/python2.7/site-packages/methylpy/run_rms_tests.out /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: in function _start': (.text+0x20): undefined reference tomain' collect2: error: ld returned 1 exit status

yupenghe commented 5 years ago

Thanks for the update. I am happy to help. What OS are you using?

There are a few issues that are quite confusing.

  1. The default printed message of "run_rms_tests.out" is weird. The full message should be
    Usage: ./rms.out <chunk files> <output file> <samples> <min_cov> <num_sims> <num_sig_tests> <seed>

    I checked the code and I cannot imagine how only part of the message was printed out.

Nonetheless, since you can get some help message printed out, we should be able to test this binary file separately. Can you try the below commands?

wget http://neomorph.salk.edu/yupeng/share/test_dmr.tar.gz
tar xf test_dmr.tar.gz
/home/hilar/.local/lib/python2.7/site-packages/methylpy/run_rms_tests.out allc_P0_FB_1.tsv,allc_P0_HT_1.tsv test_output.txt A,B 1 1000 100 -1

If the command can be run without error, we can be sure that the run_rms_tests.out is working well.

  1. When you tried to recompile the run_rms_tests.out, the command missed a key input parameter. Can you try the below command?
    g++ -O3 -l gsl -l gslcblas -o /home/hilar/.local/lib/python2.7/site-packages/methylpy/run_rms_tests.out METHYLPY_PATH/rms.cpp

    where METHYLPY_PATH/rms.cpp is the path to the CPP file under methylpy/methylpy https://github.com/yupenghe/methylpy/blob/methylpy/methylpy/rms.cpp

If you are using Ubuntu, please try the below command too if the above does not work

g++ -o /home/hilar/.local/lib/python2.7/site-packages/methylpy/run_rms_tests.out METHYLPY_PATH/rms.cpp `gsl-config --cflags —libs`

Sorry it has been a hassle to just have DMRfind working.

Yupeng

avellab commented 5 years ago

OS is Windows 10 Pro, 64-bit and I am using Kali-Linux oracle VM to run these scripts

1) command aborted:

hilar@kali:/mnt/udrive/My_Work/Bioinf$ wget http://neomorph.salk.edu/yupeng/share/test_dmr.tar.gz --2019-10-17 16:06:39-- http://neomorph.salk.edu/yupeng/share/test_dmr.tar.gz Resolving neomorph.salk.edu (neomorph.salk.edu)... 198.202.69.49 Connecting to neomorph.salk.edu (neomorph.salk.edu)|198.202.69.49|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 499837 (488K) [application/x-gzip] Saving to: ‘test_dmr.tar.gz’

test_dmr.tar.gz 100%[====================================>] 488.12K 2.49MB/s in 0.2s

2019-10-17 16:06:40 (2.49 MB/s) - ‘test_dmr.tar.gz’ saved [499837/499837]

hilar@kali:/mnt/udrive/My_Work/Bioinf$ tar xf test_dmr.tar.gz hilar@kali:/mnt/udrive/My_Work/Bioinf$ /home/hilar/.local/lib/python2.7/site-packages/methylpy/run_rms_tests.out allc_P0_FB_1.tsv,allc_P0_HT_1.tsv test_output.txt A,B 1 1000 100 -1 free(): double free detected in tcache 2 Aborted

2) I will do this tomorrow, our system is down now.

yupenghe commented 5 years ago

Ok. Please let me know if you have any luck with 2. Btw, do you have any experience working with docker? It could be much easier solution for this issue. It will take a few days but I can prepare a docker image with methylpy installed for you. Within the docker, you should be able to run DMRfind without any further setup.

avellab commented 5 years ago

Thanks Yupeng. "docker" solution is great. I have not worked with it before, but I start learning now. It would be great to have all the methylpy pipeline in one place, would you please add all the alignment and other pre-processing steps in addition to DMRfind function?

this is what I got for question 2:

hilar@kali:/mnt/udrive/My_Work/softwares$ g++ -O3 -l gsl -l gslcblas -o /home/hilar/.local/lib/python2.7/site-packages/methylpy/run_rms_tests.out /mnt/udrive/My_Work/Bioinf/methyl-Seq/methylpy/methylpy/rms.cpp

/mnt/udrive/My_Work/Bioinf/methyl-Seq/methylpy/methylpy/rms.cpp: In function ‘int rms_test(std::vector<std::vector >, float, std::vector<std::vector >, std::vector<std::vector >*)’: /mnt/udrive/My_Work/Bioinf/methyl-Seq/methylpy/methylpy/rms.cpp:88:1: warning: no return statement in function returning non-void [-Wreturn-type] } ^

yupenghe commented 5 years ago

Hey I add docker solution for running methylpy. Please check it out: https://github.com/yupenghe/methylpy#use-methylpy-without-installation

This should resolve the DMRfind issue. I close this issue and please feel free to reopen it if there is still issue in running DMRfind.