t-neumann / slamdunk

Streamlining SLAM-seq analysis with ultra-high sensitivity
GNU Affero General Public License v3.0
38 stars 23 forks source link

paired-end and other clarifications #57

Closed amitblum closed 4 years ago

amitblum commented 5 years ago

Hi Tobias, Two questions:

  1. I guess that there is no option for paired-end alignment, and I should use other aligners and use their BAM files as SLAM-dunk input?
  2. Can you explain the output of 'alleyoop dump' function? what is tcCount and ConversionRates output? this is example of 1 line (read) : NB501465:227:H2W3HBGX5:2:12311:26855:10489 1
    CTTATATGACATGTCCCCATACCCATCACAATCTCCAGCATTCCCCCAA
    3
    12,0,0,0,0,0,17,0,0,0,0,0,3,0,0,0,3,0,12,0,0,0,0,0,0 564764,T,N,N,9,C,36,False;564770,T,N,N,15,C,27,False;564781,T,N,N,26,C,36,False;

Thanks Amit

t-neumann commented 5 years ago

Hi @amitblum

  1. Slamdunk uses NextGenMap under the hood for conversion-aware alignments. We have not implemented yet a paired-end option on the python end calling NGM, but if you run it manually, you should be able to get it running also in paired-end mode. I can give you directions on that if you want.

What is more of a problem is that Slamdunk only processes stranded reads (since Quantseq is a stranded protocol) so you would have to revcomp the respective mates of the read pairs to also get those counted.

  1. This will call the __str__ method on the SlamSeqRead objects. Should be read name, strand, sequence, and then a couple more which I don't see in your output. The last two lines should be the individual base conversions of the read and the last line being and array of SlamSeqAlignmentPosition objects, where the __repr__ method is called. So basically yielding reference base, position, quality, read base, position quality and whether it coincides with a called SNP for masking.

Cheers,

Tobi

amitblum commented 5 years ago

Hi Tobi, Thanks for the reply. Yes, I would like to have your directions for paired-end mode.

Best Amit

t-neumann commented 5 years ago

Hi @amitblum ,

sorry for being so slow to reply:

Here is the python wrapper around the ngm call:

https://github.com/t-neumann/slamdunk/blob/master/slamdunk/dunks/mapper.py#L77

I suggest you reproduce a plain call with that on the NGM installation that comes with slamdunk (should be in the install dir of slamdunk slamdunk/contrib).

Documentation on how to adapt the NGM call for paired end reads is here. Essentially it should be something like replacing the -q with -1 and -2. Just be sure to include also --slam-seq 2 as done in the python call for conversion aware scoring and --rg-id and --rg-sm with the proper colon separated info (simply stick to sampleName:pulse:0 as shown below) for slamdunk to run properly

--rg-id sampleID
--rg-sm sampleName:pulse:0

Once you have the mapped bam file, you can proceed with the filter snp and count calls from within slamdunk on those.

amitblum commented 5 years ago

Thanks! Amit

BrianLohman commented 4 years ago

Hi @amitblum,

Were you able to successfully implement these changes? I am about to attempt the same. Curious if it worked out.

Cheers,

Brian

BrianLohman commented 4 years ago

Hi Tobias,

I am trying to tweak the call to NGM to make it take paired end data. I have interleaved R1 and R2. I believe all that's left it so swap the -q for a -p in mapper.py. However, I can't find mapper.py. I am using the Docker image you posted on Docker Hub. I checked every directory in the PATH but I don't see it. Can you please tell me where I can find mapper.py?

Thank you,

Brian

t-neumann commented 4 years ago

Hi,

you should be able to run pip show slamdunk to find the folder of the slamdunk package and then in the dunks subfolder you would find mapper.py file.

Good luck!

BrianLohman commented 4 years ago

Hi Tobias,

Thanks for your suggestion. I was able to locate mapper.py.

For others, after entering the docker container:

conda activate slamdunk
cd /opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/dunks

Cheers,

Brian

EllieDuan commented 3 years ago

Hi Tobias,

I'm using nf-core/slamseq pipeline (https://nf-co.re/slamseq/dev/usage) and wondering if pair-end mode can be specified here?

Thank you!!

t-neumann commented 3 years ago

Hi,

unfortunately nf-core/slamseq only works with single-end data for now since the original high-throughput Quantseq protocol relies on single-end sequencing. We are working on something for paired-end data, but that will still take a while - sorry.

jkobject commented 3 years ago

https://github.com/jkobject/slamdunk a version that should work with paired end data

drkoryjohns commented 2 years ago

I do not see in the documentation how the sample text file is to be organized for paired-end data. Can you please provide example and the slamdunk all command example to execute for paired-end analysis? Would it also be an option to take the reverse complement of the reverse reads then concatenate with the forward reads files and run slamdunk as it is described in the documentation? Thank you.

t-neumann commented 2 years ago

Hi @drkoryjohns maybe I can refer you to the fork https://github.com/jkobject/slamdunk of @monikaperez (see #105). I haven't tried it yet, but it reads as if it could answer your questions.