ijuric / MAPS

18 stars 11 forks source link

parameter for ignoring read duplicates #34

Open jaavedm opened 3 years ago

jaavedm commented 3 years ago

Is there a parameter option that will allow MAPS not to do read deduplication? Or is there a script I can edit in the package that will allow me to skip the deduplication process? Can you point me in the right direction of what to specify on the command line, or which script/line# that I can modify?

The reason I'm asking is that I have constructed and sequenced a HiChip library with UMIs (unique molecule identifiers). I've mapped the reads and used the UMIs to remove the duplicated PCR products already. This step gives me back FASTQ files where the reads are already de-duplicated. I want to provide these de-duplicated reads to MAPS, but I want MAPS to not do any further deduplication because reads share the same chrom, start, and end.

I appreciate your guidance. Thanks.

armenabnousi commented 3 years ago

There is no option for skipping duplicate removal. One option would be to allow it to run as usual. Once it started performing duplicate removal (or any time after that), stop the execution. And use the bam file generated before duplicate removal for the rest of the operation. (This will mess up the QC metrics generated in the feather.qc.tsv file). To do this:

  1. Find the bam file prior to duplicate removal (feather_output//.fixmated.bam)
  2. sort the bam file by query name using samtools (samtools sort -n -o <file from 1>)
  3. rename the sorted file output from step2 above to *.srtn.rmdup.bam (ie. [prefix].fixmated.bam -> [prefix].srtn.rmdup.bam)
  4. run: python <MAPS_DIR>/bin/feather/feather_pipe split -o <output_dir> -p <dataset_name> -l 1000 -s <[prefix].srtn.rmdup.bam> -a <ChIP-seq filepath>

This will generate the output of the mapping/preprocessing. Then in the _runpipeline.sh file make the following changes: 5.1. set feather=0 (on line 5) 5.2 set feather_output_symlink="<output_dir used in step 4 above>"

run the run_pipeline.sh file. Hope this helps and let us know if you encounter problems.