ijuric / MAPS

18 stars 11 forks source link

what does this mean? SET THE VARIABLES AT THIS PORTION ONLY IF number_of_datasets > 1 (merging exisitng datasets) #11

Closed YichaoOU closed 4 years ago

YichaoOU commented 4 years ago

Not sure how should I change the following in the run_pipeline_test.sh.

####################################################################

SET THE VARIABLES AT THIS PORTION ONLY IF

number_of_datasets > 1 (merging exisitng datasets)

specify as many datasets as required

#################################################################### dataset1="/home/jurici/MAPS/PLAC-Seq_datasets/test_dataset2/feather_output/test_set1_current" dataset2="/home/jurici/MAPS/PLAC-Seq_datasets/test_dataset2/feather_output/test_set2_current" dataset3="" dataset4=""

YichaoOU commented 4 years ago

Also, should I change all of the genomic feature path?

if [ $organism == "mm10" ]; then if [ -z $bwa_index ]; then bwa_index="/home/jurici/MAPS/MAPS_data_files/"$organism"/BWA_index/mm10_chrAll.fa" fi genomic_feat_filepath=$cwd"/../MAPS_data_files/"$organism"/genomic_features/F_GC_MMboI"$resolution"Kb_el.mm10.txt" chr_count=19 elif [ $organism == "mm9" ]; then if [ -z $bwa_index ]; then bwa_index="/home/jurici/MAPS/MAPS_data_files/"$organism"/BWA_index/mm9.fa" fi genomic_feat_filepath=$cwd"/../MAPS_data_files/"$organism"/genomic_features/F_GC_MMboI"$resolution"Kb_el.mm9.txt" chr_count=19 elif [ $organism == "hg19" ]; then if [ -z $bwa_index ]; then bwa_index="/home/jurici/MAPS/MAPS_data_files/"$organism"/BWA_index/hg19.fa" fi genomic_feat_filepath=$cwd"/../MAPS_data_files/"$organism"/genomic_features/F_GC_MMboI"$resolution"Kb_el.hg19.txt" chr_count=22 elif [ $organism == "hg38" ]; then if [ -z $bwa_index ]; then bwa_index="/home/jurici/MAPS/MAPS_data_files/"$organism"/BWA_index/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta" fi genomic_feat_filepath=$cwd"/../MAPS_data_files/"$organism"/genomic_features/F_GC_MMboI"$resolution"Kb_el.GRCh38.txt" chr_count=22 fi

armenabnousi commented 4 years ago

Hi,

Not sure how should I change the following in the run_pipeline_test.sh

If you have multiple replicates and need to combine them you will need fill out that section. For example if you have two replicates, you will set the "dataset1" and "dataset2" to the directories that contain the results for the two datasets. (You will need to run MAPS on each dataset separately before merging them).

If you have only one replicate, you should ignore this section and set the "number_of_datasets" variable (line #7) to 1.

armenabnousi commented 4 years ago

Also, should I change all of the genomic feature path

If you are using the provided genomic features files (mm9, mm10, hg19 or hg38), then you don't need to worry about "genomic_feat_filepath" variable. Specifying the genome/organism on line #13, should automatically set the path to genomic features file. Please let us know if there are still problems.

YichaoOU commented 4 years ago

Thanks for the quick reply! I have updated the paths in run_pipeline_test.sh like the following:

!/bin/bash

python_path=/home/yli11/.conda/envs/HiChIP_MAPS/bin/python #should have pysam, pybedtools installed. bedtools, samtools should be in the path Rscript_path=/home/yli11/.conda/envs/HiChIP_MAPS/bin/Rscript ###################################################################

fastq_dir="/home/yli11/Programs/MAPS/examples/test_set1" outdir="/home/yli11/Programs/MAPS/examples/test_set1" macs2_filepath="/home/yli11/Programs/MAPS/examples/test_set1/macs2_peaks_final.replicated.narrowPeak" organism="mm10" bwa_index="/xxx/Data_resource/Genome/Mouse/mm10/bwa_16a_index/mm10_main.fa"

Nothing else changed.

But I got the following error:

Mon Aug 26 10:19:41 2019 starting mapping and filtering operation Mon Aug 26 10:19:41 2019 calling bwa for /home/yli11/Programs/MAPS/examples/test_set1/test_set1_R1.fastq Mon Aug 26 10:19:54 2019 calling bwa for /home/yli11/Programs/MAPS/examples/test_set1/test_set1_R2.fastq Mon Aug 26 10:20:07 2019 calling samtools sort for /home/yli11/Programs/MAPS/examples/test_set1/feather_output/test_set1_20190826_101940/tempfiles/test_set1_R1.fastq.bwa.sam storing in /home/yli11/Programs/MAPS/examples/test_set1/feather_output/test_set1_20190826_101940/tempfiles/test_set1_R1.fastq.bwa.sam.srtn Mon Aug 26 10:20:07 2019 calling samtools sort for /home/yli11/Programs/MAPS/examples/test_set1/feather_output/test_set1_20190826_101940/tempfiles/test_set1_R2.fastq.bwa.sam storing in /home/yli11/Programs/MAPS/examples/test_set1/feather_output/test_set1_20190826_101940/tempfiles/test_set1_R2.fastq.bwa.sam.srtn Mon Aug 26 10:20:07 2019 merging /home/yli11/Programs/MAPS/examples/test_set1/feather_output/test_set1_20190826_101940/tempfiles/test_set1_R1.fastq.bwa.sam.srtn and /home/yli11/Programs/MAPS/examples/test_set1/feather_output/test_set1_20190826_101940/tempfiles/test_set1_R2.fastq.bwa.sam.srtn Mon Aug 26 10:20:08 2019 filtering and pairing reads Mon Aug 26 10:20:10 2019 paired bam file generated. Sorting by coordinates. Mon Aug 26 10:20:11 2019 calling samtools rmdup Mon Aug 26 10:20:11 2019 calling samtools flagstat on mapped file Mon Aug 26 10:20:12 2019 calling samtools flagstat on mapped and duplicate-removed file Traceback (most recent call last): File "/home/yli11/Programs/MAPS/bin/feather/feather_pipe", line 122, in main() File "/home/yli11/Programs/MAPS/bin/feather/feather_pipe", line 52, in main filter_output_filename = filter_main(fastq1, fastq2, bwa_index, mapq, outdir, prefix, threads, to_file = False) File "/research/rgs01/home/clusterHome/yli11/Programs/MAPS/bin/feather/feather_filter_chr.py", line 80, in filter_main intra_count = lines[11].split()[0] IndexError: list index out of range sed: can't read /home/yli11/Programs/MAPS/examples/test_set1/feather_output/test_set1_20190826_101940/test_set1.feather.qc: No such file or directory test_set1 /home/yli11/Programs/MAPS/examples/test_set1/MAPS_output/test_set1_20190826_101940/ /home/yli11/Programs/MAPS/examples/test_set1/macs2_peaks_final.replicated.narrowPeak /home/yli11/Programs/MAPS/bin/../MAPS_data_files/mm10/genomic_features/F_GC_M_MboI_10Kb_el.mm10.txt /home/yli11/Programs/MAPS/examples/test_set1/feather_output/test_set1_current/ /home/yli11/Programs/MAPS/examples/test_set1/feather_output/test_set1_current/ 10000 19 /home/yli11/Programs/MAPS/examples/test_set1/MAPS_output/test_set1_20190826_101940/ first loading parameters file ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19'] ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chrX'] loading MACS2 peaks loading metadata file doing chromosome chr1

-- handling MACS2 peaks -- handling short.bed

armenabnousi commented 4 years ago

Mon Aug 26 10:20:12 2019 calling samtools flagstat on mapped and duplicate-removed file Traceback (most recent call last): File "/home/yli11/Programs/MAPS/bin/feather/feather_pipe", line 122, in main() File "/home/yli11/Programs/MAPS/bin/feather/feather_pipe", line 52, in main filter_output_filename = filter_main(fastq1, fastq2, bwa_index, mapq, outdir, prefix, threads, to_file = False) File "/research/rgs01/home/clusterHome/yli11/Programs/MAPS/bin/feather/feather_filter_chr.py", line 80, in filter_main intra_count = lines[11].split()[0] IndexError: list index out of range

We have had this error happen when we used an older version of samtools. Please make sure your samtools version is 1.3 or higher. ("samtools --version" should output the version)

YichaoOU commented 4 years ago

cool! It fixed the problem. Thanks!

armenabnousi commented 4 years ago

That should be the problem. Here is my output: $samtools --version samtools 1.8 Using htslib 1.8 Copyright (C) 2018 Genome Research Ltd.

Is it possible to install a new version and use that?