kundajelab / chipseq_pipeline

AQUAS TF and histone ChIP-seq pipeline
BSD 3-Clause "New" or "Revised" License
105 stars 40 forks source link

shuf: end of file #34

Closed sbstatgen closed 6 years ago

sbstatgen commented 6 years ago

Hi, I am running the pipeline for the first time for a histone chipseq data paired-end with two replicates and two control replicates. I am getting this error: shuf: /data/path/prefix.trim_50bp.tagAlign.gz: end of file Program & line : '/software/path/TF_chipseq_pipeline/modules/postalign_bed.bds', line 42

When I checked the latest version of that bds file on github I found an inserted line.

"no_random_source := false help Disable --random-source for UNIX shuf. Hot fix for end of file error."

Do I need to get the latest changes and rerun the pipeline from beginning ? On a related note, how do I keep the pipeline up to date? Using 'git pull' ? I did 'git pull origin master', which was probably a mistake, as I am getting this message now.

"error: Your local changes to the following files would be overwritten by merge: default.env Please, commit your changes or stash them before you can merge. Aborting" Sorry about the naive questions, and thanks in advance.

leepc12 commented 6 years ago

No you don't have to start over. You can resume pipeline by simply running the same command on the working directory.

Yes, there was some update for the default.env file. You need to move your current default.env to somewhere and then git checkout default.env and git pull and then copy your default.env back to the git directory.

sbstatgen commented 6 years ago

Thanks...the git problem is solved. But on resuming the pipeline I am still getting the "shuf: end of file" error at the same point, in this task: task.postalign_bed.subsample_tag_rep2.line_46.id_10

leepc12 commented 6 years ago

Can you try with -no_random_source?

sbstatgen commented 6 years ago

I tried and got same error. It seems that if I give -no-random-source in the "chipseq.py" command just after json-conf file, it is not being passed to the bds script. So from the bds it is printing "$no_random_source" as 'false'. Perhaps something needs to be set in the json conf file?

leepc12 commented 6 years ago

fixed in commit 0ed57024897737b06f45fb0f82e89568ecf26764 please git pull and try again with python chipseq.py --no-random-source

sbstatgen commented 6 years ago

Thanks !! That fixed the problem. Btw, just curious what is --no-random-source. Would it give the same result with or without that flag?

leepc12 commented 6 years ago

With that flag --no-random-source you will get slightly different result for pseudo replicates.

leepc12 commented 6 years ago

--no-random-source forces pipeline to make pseudo replicates without a specified random seed. So you will not get slightly different outputs for each pipeline run (even with the same inputs).