Closed mdurante1 closed 7 years ago
It looks like your local version of metaseq python package conflicts with numpy in the conda environment. Can you try the following to debug?
$ python
>>> import metaseq
When running the command I obtained:
~/.local/lib/python2.7/site-packages/subprocess32.py
I proceeded to remove the subprocess32 files and removed my local installation of metaseq but I am still receiving the same error after ataqc runs for ~8 hours. Also the TSS plot (*.trim.PE2SE_vplot.png) is being generated so it seems that metaseq is running partly. Any further suggestions?
Here is the log file from the latest run: https://gist.github.com/mdurante1/0f479edd2beefd54db25da5499a6c04d
Hi,
I see from your comment that you might have an older version of ATAqC (we've renamed vplot.png to tss-enrich.png) - could you try uninstall the whole pipeline and then reinstall, making sure to clone recursively to get the package? And also get the latest version of the genome annotations, downloading as described in the instructions.
Thanks! Daniel
I re-installed the pipeline last week so this is a relatively new install. I just uninstalled all dependencies, removed the whole pipeline, and cloned it recursively using "git clone https://github.com/kundajelab/atac_dnase_pipelines --recursive" I reinstalled the dependencies and then checked the "run_ataqc.py" file in the ~/atac_dnase_pipelines/ataqc/ directory and it seems as if the old ATAqC version is being loaded:
def make_vplot(bam_file, tss, prefix, chromsizes, read_len, bins=400, bp_edge=2000,
processes=8, greenleaf_norm=True):
'''
Take bootstraps, generate V-plots, and get a mean and
standard deviation on the plot. Produces 2 plots. One is the
aggregation plot alone, while the other also shows the signal
at each TSS ordered by strength.
'''
logging.info('Generating vplot...')
vplot_file = '{0}_vplot.png'.format(prefix)
vplot_large_file = '{0}_large_vplot.png'.format(prefix)
Should the new version of ATAqC be loaded automatically or do I have to clone it separately?
Sorry, we were a commit behind in the sync with ATAqC. It's updated now, please try again. Thanks!
Hi Daniel,
After reinstalling the updated pipeline and redownloading the genomes, I ran the samples again and I am obtaining the same error I obtained previously. I checked the qc directory and I now see the properly named TSS plot (.trim.PE2SE_large_tss-enrich.png) and a gc plot (.trim.PE2SE_gcPlot.pdf), which was not there before. If I add -no_ataqc to the script and resubmit the job it completes the analysis so it seems to be a specific ATAqC problem. Any other thoughts on how to fix this error?
Thanks! Michael
Hi Michael,
Looking more carefully at this, we've actually resolved your issue - this is a new downstream issue, related to comparing your samples to publicly available ATAC. Can you look in your qc
folder and look for a test.log
file if you still have it? That might have some more information for me to look at to see what's going on.
Daniel
Here is the test.log file from the latest run: https://gist.github.com/mdurante1/8dc28ce6195b7b54fdfee7054524082c
Hope this helps! Michael
Could you do two things for me:
1) In your error log, it makes a note that your BAM index file (.bai) is older than the BAM file. Could you do a clean re-run (if it doesn't take too long to do the re-run)?
2) Could you run which conda
in your terminal where you run BDS and share the output path?
Daniel
Hi Daniel,
~/miniconda3/envs/bds_atac/bin/conda
Michael
Sounds good. I've also received feedback from someone else who ran into this issue:
"I removed miniconda and reinstalled miniconda version 4.0.5. I then removed/re-cloned the bds_atac pipeline. I then used uninstall_dependencies.sh and then install_dependencies.sh"
So basically try a full uninstall, set up miniconda first, then clone the pipeline, then the dependencies. Hope this helps!
Also to do a quick test, do consider subsampling your FASTQ files and just running one of them to make sure that it can make it through ATAqC. This will decrease the time to debug before having to do a full re-run of everything :)
Hi Daniel,
I did a clean run through and I am now getting a different error:
Traceback (most recent call last):
File "/nethome/mdurante/atac_dnase_pipelines/ataqc/run_ataqc.py", line 1598, in <module>
main()
File "/nethome/mdurante/atac_dnase_pipelines/ataqc/run_ataqc.py", line 1413, in main
[flagstat, mapped_count] = get_samtools_flagstat(ALIGNED_BAM)
File "/nethome/mdurante/atac_dnase_pipelines/ataqc/run_ataqc.py", line 589, in get_samtools_flagstat
return flagstat, mapped_reads
UnboundLocalError: local variable 'mapped_reads' referenced before assignment
Here is the log file from the latest run: https://gist.github.com/mdurante1/ada64d8f842812e984857a7ac90c6069
I have also included the test.log file: https://gist.github.com/mdurante1/30bf1378c550948a653122fd5c184d3c
Thanks, Michael
It's a pysam issue - there are different outputs in versions 0.9+ (https://github.com/pysam-developers/pysam/issues/245 in case you're curious).
We've pushed a hot fix, go ahead and pull the pipeline again (recursively to get the ataqc submodule). no need to reinstall anything.
Hi Daniel,
I did another clean run through with all the suggested changes and I am now getting a different error:
Traceback (most recent call last):
File "/nethome/mdurante/atac_dnase_pipelines/ataqc/run_ataqc.py", line 1598, in <module>
main()
File "/nethome/mdurante/atac_dnase_pipelines/ataqc/run_ataqc.py", line 1413, in main
[flagstat, mapped_count] = get_samtools_flagstat(ALIGNED_BAM)
File "/nethome/mdurante/atac_dnase_pipelines/ataqc/run_ataqc.py", line 582, in get_samtools_flagstat
results = pysam.flagstat(bam_file, split_lines=TRUE)
NameError: global name 'TRUE' is not defined
Here's the log file of this run: https://gist.github.com/mdurante1/caee77db53813e0c990f35abf5d3cace
test.log: https://gist.github.com/mdurante1/bab061d3e306b6f0e9412448c22e851e
Thanks, Michael
Hi Michael,
Thanks - so I made a big goof and used R syntax accidentally. It's fixed and in the latest ataqc module, so go ahead and do the same as before (just pull the pipeline recursively to get the submodule, no reinstallations needed) and that bug should go away.
Daniel
Hi Daniel,
Your fixed worked for that bug but now I am receiving the original error again:
Traceback (most recent call last):
File "/nethome/mdurante/atac_dnase_pipelines/ataqc/run_ataqc.py", line 1598, in <module>
main()
File "/nethome/mdurante/atac_dnase_pipelines/ataqc/run_ataqc.py", line 1460, in main
ROADMAP_META, OUTPUT_PREFIX)
File "/nethome/mdurante/atac_dnase_pipelines/ataqc/run_ataqc.py", line 905, in compare_to_roadmap
sample_mean0_col)
File "/nethome/mdurante/miniconda3/envs/bds_atac/lib/python2.7/site-packages/scipy/stats/stats.py", line 3310, in spearmanr
rs = np.corrcoef(ar, br, rowvar=axisout)
File "/nethome/mdurante/miniconda3/envs/bds_atac/lib/python2.7/site-packages/numpy/lib/function_base.py", line 2145, in corrcoef
c = cov(x, y, rowvar)
File "/nethome/mdurante/miniconda3/envs/bds_atac/lib/python2.7/site-packages/numpy/lib/function_base.py", line 2024, in cov
X = np.vstack((X, y))
File "/nethome/mdurante/miniconda3/envs/bds_atac/lib/python2.7/site-packages/numpy/core/shape_base.py", line 230, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Here are the corresponding log files: https://gist.github.com/mdurante1/20997f4b9a57f89f8c10464b4910f5c5 https://gist.github.com/mdurante1/a52c7db10f97e1e0748da4de0ce9be3e
@vervacity Hi Daniel,
I am also having the same problem. I get this error
# SYS command. line 145
TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."
StdErr (100000000 lines) :
Picked up _JAVA_OPTIONS: -Xms256M -Xmx15G -XX:ParallelGCThreads=1
ERROR 2017-05-18 23:04:55 ProcessExecutor Warning messages:
ERROR 2017-05-18 23:04:55 ProcessExecutor 1: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2017-05-18 23:04:55 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2017-05-18 23:04:55 ProcessExecutor 2: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2017-05-18 23:04:55 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2017-05-18 23:04:55 ProcessExecutor 3: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2017-05-18 23:04:55 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2017-05-18 23:04:55 ProcessExecutor 4: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2017-05-18 23:04:55 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2017-05-18 23:04:55 ProcessExecutor 5: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2017-05-18 23:04:55 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2017-05-18 23:04:55 ProcessExecutor 6: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2017-05-18 23:04:55 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2017-05-18 23:04:55 ProcessExecutor 7: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2017-05-18 23:04:55 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
Picked up _JAVA_OPTIONS: -Xms256M -Xmx15G -XX:ParallelGCThreads=1
[bam_sort_core] merging from 26 files...
[bam_sort_core] merging from 26 files...
Picked up _JAVA_OPTIONS: -Xms256M -Xmx15G -XX:ParallelGCThreads=1
Traceback (most recent call last):
File "/hd2/atac_dnase_pipelines/ataqc/run_ataqc.py", line 1598, in <module>
main()
File "/hd2/atac_dnase_pipelines/ataqc/run_ataqc.py", line 1413, in main
[flagstat, mapped_count] = get_samtools_flagstat(ALIGNED_BAM)
File "/hd2/atac_dnase_pipelines/ataqc/run_ataqc.py", line 582, in get_samtools_flagstat
results = pysam.flagstat(bam_file, split_lines=TRUE)
NameError: global name 'TRUE' is not defined
Fatal error: /hd2/atac_dnase_pipelines/atac.bds, line 1525, pos 2. Task/s failed.
atac.bds, line 74 : main()
atac.bds, line 77 : void main() { // atac pipeline starts here
atac.bds, line 97 : ataqc()
atac.bds, line 1514 : void ataqc() {
atac.bds, line 1525 : wait
Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
Here is the test. log file in the qc folder.
INFO:root:Reading bowtie alignment log...
INFO:root:18507448 reads; of these:
INFO:root:18507448 (100.00%) were paired; of these:
INFO:root:189484 (1.02%) aligned concordantly 0 times
INFO:root:8274240 (44.71%) aligned concordantly exactly 1 time
INFO:root:10043724 (54.27%) aligned concordantly >1 times
INFO:root:----
INFO:root:189484 pairs aligned concordantly 0 times; of these:
INFO:root:26354 (13.91%) aligned discordantly 1 time
INFO:root:----
INFO:root:163130 pairs aligned 0 times concordantly or discordantly; of these:
INFO:root:326260 mates make up the pairs; of these:
INFO:root:233407 (71.54%) aligned 0 times
INFO:root:21280 (6.52%) aligned exactly 1 time
INFO:root:71573 (21.94%) aligned >1 times
INFO:root:99.37% overall alignment rate
INFO:root:Getting mitochondrial chromosome fraction...
INFO:root:Getting GC bias...
INFO:root:java -Xmx4G -jar /home/rick/anaconda3/envs/bds_atac/bin/../share/picard*/picard.jar CollectGcBiasMetrics R=/hd2/bds_genome_data/mm9/mm9.fa I=/hd2/atacseq_taf/output/ATAC_5lKO/out/align/rep2/ATAC_5_L002_R1_001.trim.PE2SE.nodup.bam O=/hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5_L002_R1_001.trim.PE2SE_gc.txt VERBOSITY=ERROR QUIET=TRUE ASSUME_SORTED=FALSE CHART=/hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5_L002_R1_001.trim.PE2SE_gcPlot.pdf S=/hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5_L002_R1_001.trim.PE2SE_gcSummary.txt
INFO:root:Running preseq...
INFO:root:preseq lc_extrap -P -B -o /hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5_L002_R1_001.trim.PE2SE.preseq.dat /hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5_L002_R1_001.trim.PE2SE.sorted.bam -v 2> /hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5_L002_R1_001.trim.PE2SE.preseq.log
INFO:root:samtools mapq 30...
INFO:root:Running Picard MarkDuplicates...
INFO:root:samtools flagstat...
@rfarouni you probably grabbed the commit right before we corrected it - go ahead and pull the latest version (making sure to do recursive to get the latest ataqc module) and it should work.
@mdurante1 turns out that we had an annotation switch in mm9. this has now been corrected and should run properly. please pull the latest commit and download the latest genome data annotations. (also as an aside, I would personally recommend making the jump to mm10, if it fits with your other analyses!)
@vervacity Thanks. That fixed it, but another error came up several steps down the pipeline. Here is the error
--------------------Stderr--------------------
Picked up _JAVA_OPTIONS: -Xms256M -Xmx20G -XX:ParallelGCThreads=1
ERROR 2017-05-20 17:51:40 ProcessExecutor Warning messages:
ERROR 2017-05-20 17:51:40 ProcessExecutor 1: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2017-05-20 17:51:40 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2017-05-20 17:51:40 ProcessExecutor 2: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2017-05-20 17:51:40 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2017-05-20 17:51:40 ProcessExecutor 3: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2017-05-20 17:51:40 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2017-05-20 17:51:40 ProcessExecutor 4: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2017-05-20 17:51:40 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2017-05-20 17:51:40 ProcessExecutor 5: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2017-05-20 17:51:40 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2017-05-20 17:51:40 ProcessExecutor 6: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2017-05-20 17:51:40 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
ERROR 2017-05-20 17:51:40 ProcessExecutor 7: In arrows(metrics$GC, metrics$NORMALIZED_COVERAGE - metrics$ERROR_BAR_WIDTH, :
ERROR 2017-05-20 17:51:40 ProcessExecutor zero-length arrow is of indeterminate angle and so skipped
Picked up _JAVA_OPTIONS: -Xms256M -Xmx20G -XX:ParallelGCThreads=1
[bam_sort_core] merging from 26 files...
[bam_sort_core] merging from 26 files...
Picked up _JAVA_OPTIONS: -Xms256M -Xmx20G -XX:ParallelGCThreads=1
Picked up _JAVA_OPTIONS: -Xms256M -Xmx20G -XX:ParallelGCThreads=1
processing chromosomes
Traceback (most recent call last):
File "/hd2/atac_dnase_pipelines/ataqc/run_ataqc.py", line 1598, in
main()
File "/hd2/atac_dnase_pipelines/ataqc/run_ataqc.py", line 1460, in main
ROADMAP_META, OUTPUT_PREFIX)
File "/hd2/atac_dnase_pipelines/ataqc/run_ataqc.py", line 905, in compare_to_roadmap
sample_mean0_col)
File "/home/rick/anaconda3/envs/bds_atac/lib/python2.7/site-packages/scipy/stats/stats.py", line 3310, in spearmanr
rs = np.corrcoef(ar, br, rowvar=axisout)
File "/home/rick/anaconda3/envs/bds_atac/lib/python2.7/site-packages/numpy/lib/function_base.py", line 2145, in corrcoef
c = cov(x, y, rowvar)
File "/home/rick/anaconda3/envs/bds_atac/lib/python2.7/site-packages/numpy/lib/function_base.py", line 2024, in cov
X = np.vstack((X, y))
File "/home/rick/anaconda3/envs/bds_atac/lib/python2.7/site-packages/numpy/core/shape_base.py", line 230, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
and here is the test.log file
INFO:root:Reading bowtie alignment log...
INFO:root:18507448 reads; of these:
INFO:root:18507448 (100.00%) were paired; of these:
INFO:root:189484 (1.02%) aligned concordantly 0 times
INFO:root:8274240 (44.71%) aligned concordantly exactly 1 time
INFO:root:10043724 (54.27%) aligned concordantly >1 times
INFO:root:----
INFO:root:189484 pairs aligned concordantly 0 times; of these:
INFO:root:26354 (13.91%) aligned discordantly 1 time
INFO:root:----
INFO:root:163130 pairs aligned 0 times concordantly or discordantly; of these:
INFO:root:326260 mates make up the pairs; of these:
INFO:root:233407 (71.54%) aligned 0 times
INFO:root:21280 (6.52%) aligned exactly 1 time
INFO:root:71573 (21.94%) aligned >1 times
INFO:root:99.37% overall alignment rate
INFO:root:Getting mitochondrial chromosome fraction...
INFO:root:Getting GC bias...
INFO:root:java -Xmx4G -jar /home/rick/anaconda3/envs/bds_atac/bin/../share/picard*/picard.jar CollectGcBiasMetrics R=/hd2/bds_genome_data/mm9/mm9.fa I=/hd2/atacseq_taf/output/ATAC_5lKO/out/align/rep2/ATAC_5lKO_TAAGGCGA_L002_R1_001.trim.PE2SE.nodup.bam O=/hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5lKO_TAAGGCGA_L002_R1_001.trim.PE2SE_gc.txt VERBOSITY=ERROR QUIET=TRUE ASSUME_SORTED=FALSE CHART=/hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5lKO_TAAGGCGA_L002_R1_001.trim.PE2SE_gcPlot.pdf S=/hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5lKO_TAAGGCGA_L002_R1_001.trim.PE2SE_gcSummary.txt
INFO:root:Running preseq...
INFO:root:preseq lc_extrap -P -B -o /hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5lKO_TAAGGCGA_L002_R1_001.trim.PE2SE.preseq.dat /hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5lKO_TAAGGCGA_L002_R1_001.trim.PE2SE.sorted.bam -v 2> /hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5lKO_TAAGGCGA_L002_R1_001.trim.PE2SE.preseq.log
INFO:root:samtools mapq 30...
INFO:root:Running Picard MarkDuplicates...
INFO:root:samtools flagstat...
INFO:root:37014896 + 0 in total (QC-passed reads + QC-failed reads)
INFO:root:0 + 0 secondary
INFO:root:0 + 0 supplementary
INFO:root:0 + 0 duplicates
INFO:root:36781489 + 0 mapped (99.37%:-nan%)
INFO:root:37014896 + 0 paired in sequencing
INFO:root:18507448 + 0 read1
INFO:root:18507448 + 0 read2
INFO:root:36635928 + 0 properly paired (98.98%:-nan%)
INFO:root:36758530 + 0 with itself and mate mapped
INFO:root:22959 + 0 singletons (0.06%:-nan%)
INFO:root:33546 + 0 with mate mapped to a different chr
INFO:root:17237 + 0 with mate mapped to a different chr (mapQ>=5)
INFO:root:final read counts...
INFO:root:insert size distribution...
INFO:root:java -Xmx4G -jar /home/rick/anaconda3/envs/bds_atac/bin/../share/picard*/picard.jar CollectInsertSizeMetrics INPUT=/hd2/atacseq_taf/output/ATAC_5lKO/out/align/rep2/ATAC_5lKO_TAAGGCGA_L002_R1_001.trim.PE2SE.nodup.bam OUTPUT=/hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5lKO_TAAGGCGA_L002_R1_001.trim.PE2SE.inserts.hist_data.log H=/hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5lKO_TAAGGCGA_L002_R1_001.trim.PE2SE.inserts.hist_graph.pdf VERBOSITY=ERROR QUIET=TRUE W=1000 STOP_AFTER=5000000
INFO:root:Generating tss plot...
INFO:root:signal to noise...
INFO:root:bigWigAverageOverBed /hd2/atacseq_taf/output/ATAC_5lKO/out/signal/macs2/rep2/ATAC_5lKO_TAAGGCGA_L002_R1_001.trim.PE2SE.nodup.tn5.pf.pval.signal.bigwig /hd2/bds_genome_data/mm9/ataqc/mm9_univ_dhs_ucsc.from_mm10.bed.gz /hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep2/ATAC_5lKO_TAAGGCGA_L002_R1_001.trim.PE2SE.signal
@rfarouni ok I see. You need to update your species config and files - go ahead and re-download the genome data (https://github.com/kundajelab/atac_dnase_pipelines#genome-data) and make sure in your species file ([SPECIES FILE]
as noted in the installation instructions) that you have a line has something like this under [mm9]
: reg2map_bed = /path/to/pipeline_genome_data/mm9/ataqc/mm9_dhs_universal_ucsc_v1.bed.gz
@vervacity I re-downloaded the genome data, but there is no such file in my genome_data folder. The file mm9/ataqc/mm9_dhs_universal_ucsc_v1.bed.gz
is not there. I have this though dnase = /hd2/bds_genome_data/mm9/ataqc/mm9_univ_dhs_ucsc.from_mm10.bed.gz
@rfarouni so I'm seeing that it's listed in the latest genome data download script (https://github.com/kundajelab/atac_dnase_pipelines/blob/master/install_genome_data.sh#L64) and is also available in our genome data folder (http://mitra.stanford.edu/kundaje/genome_data/mm9/ataqc/) - could you grab the latest genome download script and try again?
I downloaded the file and added the filepath in the species file. It fixed it and the final report gets produced, but I still get one error
# SYS command. line 30
if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/hd2/atac_dnase_pipelines/.:/hd2/atac_dnase_pipelines/modules:/hd2/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
# SYS command. line 32
zcat /hd2/atacseq_taf/output/ATAC_5lKO/out/peak/macs2/pooled_rep/ATAC_5lKO_L001_R1_001.trim.PE2SE.nodup.tn5_pooled.pf.filt.gappedPeak.gz | sort -k1,1 -k2,2n > /hd2/atacseq_taf/output/ATAC_5lKO/out/peak/macs2/pooled_rep/ATAC_5lKO_L001_R1_001.trim.PE2SE.nodup.tn5_pooled.pf.filt.gappedPeak.bb.tmp
# SYS command. line 33
cat /hd2/bds_genome_data/mm9/mm9/mm9.chrom.sizes | grep -P 'chr[\dXY]+[ \t]' > /hd2/atacseq_taf/output/ATAC_5lKO/out/peak/macs2/pooled_rep/ATAC_5lKO_L001_R1_001.trim.PE2SE.nodup.tn5_pooled.pf.filt.gappedPeak.bb.chrsz.tmp
# SYS command. line 34
bedToBigBed -type=bed12+3 -as=/hd2/atac_dnase_pipelines/etc/gappedPeak.as /hd2/atacseq_taf/output/ATAC_5lKO/out/peak/macs2/pooled_rep/ATAC_5lKO_L001_R1_001.trim.PE2SE.nodup.tn5_pooled.pf.filt.gappedPeak.bb.tmp /hd2/atacseq_taf/output/ATAC_5lKO/out/peak/macs2/pooled_rep/ATAC_5lKO_L001_R1_001.trim.PE2SE.nodup.tn5_pooled.pf.filt.gappedPeak.bb.chrsz.tmp /hd2/atacseq_taf/output/ATAC_5lKO/out/peak/macs2/pooled_rep/ATAC_5lKO_L001_R1_001.trim.PE2SE.nodup.tn5_pooled.pf.filt.gappedPeak.bb
# SYS command. line 35
rm -f /hd2/atacseq_taf/output/ATAC_5lKO/out/peak/macs2/pooled_rep/ATAC_5lKO_L001_R1_001.trim.PE2SE.nodup.tn5_pooled.pf.filt.gappedPeak.bb.tmp /hd2/atacseq_taf/output/ATAC_5lKO/out/peak/macs2/pooled_rep/ATAC_5lKO_L001_R1_001.trim.PE2SE.nodup.tn5_pooled.pf.filt.gappedPeak.bb.chrsz.tmp
# SYS command. line 39
TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."
End coordinate 61342464 bigger than chr19 size of 61342430 line 180743 of /hd2/atacseq_taf/output/ATAC_5lKO/out/peak/macs2/pooled_rep/ATAC_5lKO_L001_R1_001.trim.PE2SE.nodup.tn5_pooled.pf.filt.gappedPeak.bb.tmp
the test log
INFO:root:Reading bowtie alignment log...
INFO:root:18014559 reads; of these:
INFO:root:18014559 (100.00%) were paired; of these:
INFO:root:226896 (1.26%) aligned concordantly 0 times
INFO:root:8040724 (44.63%) aligned concordantly exactly 1 time
INFO:root:9746939 (54.11%) aligned concordantly >1 times
INFO:root:----
INFO:root:226896 pairs aligned concordantly 0 times; of these:
INFO:root:49707 (21.91%) aligned discordantly 1 time
INFO:root:----
INFO:root:177189 pairs aligned 0 times concordantly or discordantly; of these:
INFO:root:354378 mates make up the pairs; of these:
INFO:root:232838 (65.70%) aligned 0 times
INFO:root:21941 (6.19%) aligned exactly 1 time
INFO:root:99599 (28.11%) aligned >1 times
INFO:root:99.35% overall alignment rate
INFO:root:Getting mitochondrial chromosome fraction...
INFO:root:Getting GC bias...
INFO:root:java -Xmx4G -jar /home/rick/anaconda3/envs/bds_atac/bin/../share/picard*/picard.jar CollectGcBiasMetrics R=/hd2/bds_genome_data/mm9/mm9/mm9.fa I=/hd2/atacseq_taf/output/ATAC_5lKO/out/align/rep1/ATAC_5lKO_L001_R1_001.trim.PE2SE.nodup.bam O=/hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep1/ATAC_5lKO_L001_R1_001.trim.PE2SE_gc.txt VERBOSITY=ERROR QUIET=TRUE ASSUME_SORTED=FALSE CHART=/hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep1/ATAC_5lKO_L001_R1_001.trim.PE2SE_gcPlot.pdf S=/hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep1/ATAC_5lKO_L001_R1_001.trim.PE2SE_gcSummary.txt
INFO:root:Running preseq...
INFO:root:preseq lc_extrap -P -B -o /hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep1/ATAC_5lKO_L001_R1_001.trim.PE2SE.preseq.dat /hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep1/ATAC_5lKO_L001_R1_001.trim.PE2SE.sorted.bam -v 2> /hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep1/ATAC_5lKO_L001_R1_001.trim.PE2SE.preseq.log
INFO:root:samtools mapq 30...
INFO:root:Running Picard MarkDuplicates...
INFO:root:samtools flagstat...
INFO:root:36029118 + 0 in total (QC-passed reads + QC-failed reads)
INFO:root:0 + 0 secondary
INFO:root:0 + 0 supplementary
INFO:root:0 + 0 duplicates
INFO:root:35796280 + 0 mapped (99.35%:-nan%)
INFO:root:36029118 + 0 paired in sequencing
INFO:root:18014559 + 0 read1
INFO:root:18014559 + 0 read2
INFO:root:35575326 + 0 properly paired (98.74%:-nan%)
INFO:root:35772440 + 0 with itself and mate mapped
INFO:root:23840 + 0 singletons (0.07%:-nan%)
INFO:root:40314 + 0 with mate mapped to a different chr
INFO:root:16449 + 0 with mate mapped to a different chr (mapQ>=5)
INFO:root:final read counts...
INFO:root:insert size distribution...
INFO:root:java -Xmx4G -jar /home/rick/anaconda3/envs/bds_atac/bin/../share/picard*/picard.jar CollectInsertSizeMetrics INPUT=/hd2/atacseq_taf/output/ATAC_5lKO/out/align/rep1/ATAC_5lKO_L001_R1_001.trim.PE2SE.nodup.bam OUTPUT=/hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep1/ATAC_5lKO_L001_R1_001.trim.PE2SE.inserts.hist_data.log H=/hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep1/ATAC_5lKO_L001_R1_001.trim.PE2SE.inserts.hist_graph.pdf VERBOSITY=ERROR QUIET=TRUE W=1000 STOP_AFTER=5000000
INFO:root:Generating tss plot...
INFO:root:signal to noise...
INFO:root:bigWigAverageOverBed /hd2/atacseq_taf/output/ATAC_5lKO/out/signal/macs2/rep1/ATAC_5lKO_L001_R1_001.trim.PE2SE.nodup.tn5.pf.pval.signal.bigwig /hd2/bds_genome_data/mm9/mm9/ataqc/mm9_dhs_universal_ucsc_v1.bed.gz /hd2/atacseq_taf/output/ATAC_5lKO/out/qc/rep1/ATAC_5lKO_L001_R1_001.trim.PE2SE.signal
INFO:root:ENCFF001PCS SpearmanrResult(correlation=0.033773658301836137, pvalue=0.0)
INFO:root:ENCFF001PDB SpearmanrResult(correlation=0.045394639569752246, pvalue=0.0)
INFO:root:ENCFF001PDP SpearmanrResult(correlation=0.03117305766539559, pvalue=0.0)
INFO:root:ENCFF001PEA SpearmanrResult(correlation=0.064350783611477114, pvalue=0.0)
INFO:root:ENCFF001QEN SpearmanrResult(correlation=0.11751503760067165, pvalue=0.0)
INFO:root:ENCFF001QBV SpearmanrResult(correlation=0.058886984313049347, pvalue=0.0)
INFO:root:ENCFF001PZU SpearmanrResult(correlation=0.048606934562090573, pvalue=0.0)
INFO:root:ENCFF001PFG SpearmanrResult(correlation=0.058729132628544053, pvalue=0.0)
INFO:root:ENCFF001PGU SpearmanrResult(correlation=0.11441127014544508, pvalue=0.0)
INFO:root:ENCFF001PHE SpearmanrResult(correlation=0.084972152825066191, pvalue=0.0)
INFO:root:ENCFF001PHL SpearmanrResult(correlation=0.10805913535450433, pvalue=0.0)
INFO:root:ENCFF001PHS SpearmanrResult(correlation=0.092773761325087106, pvalue=0.0)
INFO:root:ENCFF001PIB SpearmanrResult(correlation=0.099420586283102103, pvalue=0.0)
INFO:root:ENCFF001PNK SpearmanrResult(correlation=0.17334718873462551, pvalue=0.0)
INFO:root:ENCFF001PVW SpearmanrResult(correlation=0.089321528179288359, pvalue=0.0)
INFO:root:ENCFF001PIK SpearmanrResult(correlation=0.32975026450162248, pvalue=0.0)
INFO:root:ENCFF001PIY SpearmanrResult(correlation=0.32913102263702887, pvalue=0.0)
INFO:root:ENCFF001PKP SpearmanrResult(correlation=0.092786146689878099, pvalue=0.0)
INFO:root:ENCFF001PKY SpearmanrResult(correlation=0.086798762024097684, pvalue=0.0)
INFO:root:ENCFF001PLI SpearmanrResult(correlation=0.19039392295092475, pvalue=0.0)
INFO:root:ENCFF001LTZ SpearmanrResult(correlation=0.15161944771329813, pvalue=0.0)
INFO:root:ENCFF001LUI SpearmanrResult(correlation=0.072168420005018871, pvalue=0.0)
INFO:root:ENCFF001PLW SpearmanrResult(correlation=0.069049288296783431, pvalue=0.0)
INFO:root:ENCFF001PMI SpearmanrResult(correlation=0.14039189856903245, pvalue=0.0)
INFO:root:ENCFF001PMR SpearmanrResult(correlation=0.17890894073017335, pvalue=0.0)
INFO:root:ENCFF001QCS SpearmanrResult(correlation=0.067259461370709289, pvalue=0.0)
INFO:root:ENCFF001PNU SpearmanrResult(correlation=0.11656677312462088, pvalue=0.0)
INFO:root:ENCFF001PON SpearmanrResult(correlation=0.090301961635109354, pvalue=0.0)
INFO:root:ENCFF001PPI SpearmanrResult(correlation=0.14340252460503231, pvalue=0.0)
INFO:root:ENCFF001PTV SpearmanrResult(correlation=0.088709205487457948, pvalue=0.0)
INFO:root:ENCFF001PUI SpearmanrResult(correlation=0.041341040870329318, pvalue=0.0)
INFO:root:ENCFF001PWJ SpearmanrResult(correlation=0.053202448119509249, pvalue=0.0)
INFO:root:ENCFF001PUV SpearmanrResult(correlation=0.20143252463051384, pvalue=0.0)
INFO:root:ENCFF001QBE SpearmanrResult(correlation=0.085697950810573756, pvalue=0.0)
INFO:root:ENCFF001PWW SpearmanrResult(correlation=0.12324226976911377, pvalue=0.0)
INFO:root:ENCFF001PXH SpearmanrResult(correlation=0.071244657821574187, pvalue=0.0)
INFO:root:ENCFF001OQF SpearmanrResult(correlation=0.18337167266240909, pvalue=0.0)
INFO:root:ENCFF001PYR SpearmanrResult(correlation=0.14473639332147181, pvalue=0.0)
INFO:root:ENCFF001PZE SpearmanrResult(correlation=0.07972817386565216, pvalue=0.0)
INFO:root:ENCFF001PES SpearmanrResult(correlation=0.069160517857457471, pvalue=0.0)
INFO:root:ENCFF001PFY SpearmanrResult(correlation=0.075503324225279431, pvalue=0.0)
INFO:root:ENCFF001QAG SpearmanrResult(correlation=0.1032940188731358, pvalue=0.0)
INFO:root:ENCFF001PKH SpearmanrResult(correlation=0.30859281272295885, pvalue=0.0)
INFO:root:ENCFF001QHL SpearmanrResult(correlation=0.28408478077383448, pvalue=0.0)
@rfarouni can you pass along that gapped Peak file that had the error so that we can replicate it? thanks!
Hi Daniel,
When I downloaded the latest mm9 genome annotations I still received the same concatenation error. I tried running the pipeline with mm10 and did not receive this error. My future analyses will be conducted with mm10 so I will close this issue. Thanks for all your help I really appreciate it!
Best regards, Michael
Hi Daniel, I am wondering whether you have fixed mm9 genome annotation problem after previous issue post
Thank you for making this pipeline available. I am able to get the pipeline running except when I include the ATAqC step I obtain an error in numpy. I have included the full log file below.
https://gist.github.com/mdurante1/3f51ef55b2fb0758949a1c68cacb6d28
Do you have any insight on how to fix this error?